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In order to develop methodologies that are useful in the design of complex 
systems, existing designs must be studied. The DEC PDP-11 was selected for a case 
study since there are a number of designs (eight considered here), the designs span a 
wide range in basic performance (7:1) and component technology (bipolar SSI to MOS 
LSI), and th.e designs represent relatively complex systems. 

The goals of the paper are two-fold: 1) to provide actual data about design 
tradeoffs and 2) to suggest design methodologies based on this data. An archetypical 
PDP-11 implementation is described followed by model specific variations. These 
variations represent the design tradeoffs which are classified by area: technology, 
control, and data path. 

Two methodologies are presented. A top-down approach uses microcycle and 
memory read pause times to account for 907. of the variation in processor 
performance. This approach can be used in initial system planning. A bottom-up 
approach uses relative frequency of functions to determine the impact of design 
tradeoffs on performance. This approach can be used in design space exploration of a 
single design. Finally, the general cost/performance design tradeoffs used in the 
PDP-1 1 are summarized. 
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1. Introduction 

As semiconductor technology has evolved, the digital systems designer has been 
presented with an ever increasing set of primitive components from which to construct 
systems: standard SSI, MSI, and LSI as well as custom LSI components. This expanding 
choice makes it more difficult to arrive at a near-optimal cost/performance ratio in a 
design. In the case of highly complex systems, the situation is even worse since 
different primitives may be cost effective in different subareas of such systems. 

Historically, digital system design has been more of an art than a science. Good 
designs evolved from a mixture of experience, intuition, and trial and error. Only 
rarely have design methodologies been developed (e.g. two-level combinatorial logic 
minimization, wire wrap routing schemes, etc.). Effective design methodologies are 
essential for the cost-effective design of more complex systems. In addition, if the 
methodologies are sufficiently detailed, they can be applied in high-level design 
automation systems [Siew76]. 

Design methodologies may be developed by studying the results of the human 
design process. There are at least two ways to study this process. The first involves 
a controlled design experiment where several designers perform the same task. By 
contrasting the results, the range of design variation and technique can be established 
[Thom77]. However, this approach is limited to fairly small design situations due to the 
redundant use of the human designers. 

The second approach examines a series of existing designs that meet the same 
functional specification while spanning a wide range of design constraints in terms of 
cost, performance, etc. This paper considers the second approach and uses the DEC 
PDP-11^ minicomputer line as a basis of study. The PDP-11 was selected due to the 
large number of implementations (eight are considered here) with designs spanning a 
wide range in performance (roughly 7:1) and component technology (bipolar SSI, MSI, 
MOS custom LSI). The designs are relatively complex and seem to embody good design 
tradeoffs as ultimately reflected by their price /performance and commercial success. 

The design tradeoffs considered fall into three categories: circuit technology, 
control unit implementation, and data path topology. All three will be seen to have 
considerable impact on performance. Attention here is focused mainly upon the CPU. 
Memory performance enhancements such as caching are considered only insofar as 
they impinge upon CPU performance. 

This paper is divided into three major parts. The first part (Section 2) provides 
an overview of the PDP-11 functional specification (e.g. architecture) and serves as 
background for subsequent discussion of design tradeoffs. The second part (Sections 
3, 4, 5) presents an archetypical implementation followed by the model-specific 
variations from the archetype. These variations represent the design tradeoffs. The 
last part (Sections 6 and 7) presents methodologies for determining the impact of 


* DEC, PDP, LSI-11, UNIBUS, and Fastbus are registered trademarks of Digital 
Equipment Corporation. 
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various design parameters on system performance. The magnitude of the impact is 
quantified for several parameters and the use of the results in design situations 
discussed. 


2. Architectural Overview 

The PDP-11 family is a set of small- to medium-scale stored-program central 
processors with compatible instruction sets [Bell70]. The family evolution in terms of 
increased performance, constant cost, and constant performance successors is traced 
in Figure 1^. Since the 11/45, 11/55 and 11/70 use the same processor, only the 
11/45 is treated in this study. 



Figure 1: PDP-11 Family Tree 


^ The original equipment manufacturer (OEM) versions of the 11/10, 11/20, and 11/40 
are the 11/05, 11/15, and 11/35 respectively. The OEM machines are electrically 
identical (or nearly so) to their end-user counterparts, the distinction being made for 
marketing purposes only. 
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A PDP-11 system consists of three parts: a PDP-11 processor, a collection of 
memories and peripherals, and a link called the UNIBUS over which they all 
communicate (Figure 2). 



(Figure couriesy of Digital Equipment Corporation) 


Figure 2: Typical PDP-11 Configuration 


A number of features, not otherwise considered here, are available as options 
on certain processors. These include memory management and floating-point 
arithmetic. The next three subsections summarize the major architectural features of 
the PDP-11 including memory organization, processor state, addressing modes, 
instruction set, and UNIBUS protocol. The references list a number of processor 
handbooks and other documents which provide a more precise definition of the PDP-11 
architecture than is possible here. 


2.1 Memory and Processor State 

The central processor contains the control logic and data paths for instruction 
fetching and execution. Processor instructions act upon operands located either in 
memory or in one of eight general registers. These operands may be either 8-bit 
bytes or 16-bit words. 

Memory is byte or word addressable. Word addresses must be even. If N is a 
word address, then N is the byte address of the low-order byte of the word and N+l 
is the byte address of the high-order byte of the word (Figure 3). 

The control and data registers of peripheral devices are also accessed through 
the memory address space and the top 4K words of the space are reserved for this 
purpose. 

The general registers are 16 bits in length and are referred to as RO through 
R7. R6 is used as the system stack pointer (SP) to maintain a push-down list in 
memory upon which subroutine and interrupt linkages are kept. R7 is the program 
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(Figure courtesy of Digital Equipment Corporation) 


Figure 3: PDP-11 Byte and Word Addressing 


counter (PC) and always points to the next instruction to be fetched from memory. 
With minor exceptions (noted below) the $P and PC are accessible in exactly the same 
manner as any of the other general registers (RO through R5). 

Data manipulation instructions fall into two categories: arithmetic instructions 
(which interpret their operands as two’s complement integers) and logical instructions 
(which interpret their operands as bit vectors). A set of condition code flags is 
maintained by the processor and is updated according to the sign and presence of 
carry/overflow from the result of any data manipulation instruction. The condition 
codes, processor interrupt priority, and a flag enabling program execution tracing are 
contained in a processor status word (PS), which is accessible as a word in the memory 
addressing space. 


2.2 Addressing Modes and Instruction Set 

The PDP-11 instruction set allows source and destination operands to be 
referenced via eight different addressing modes. An operand reference consists of a 
field specifying which of the eight modes is to be used and a second field specifying 
which of the eight general registers is to be used. The addressing modes are: 

Mode 0 - Register - The operand is contained in the specified register. 

Mode 1 - Register deferred - The contents of the specified register are used 
to address the memory location containing the operand. 

Mode 2 - Autoincrement - The contents of the specified register are used to 
address the memory location containing the operand after which the 
register is incremented. 
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Mode 3 - Autoincrement deferred - The contents of the specified register 
address a word in memory containing the address of the operand in 
memory. The specified register is incremented after the reference. 

Mode 4 - Au.todecrem.ent - The contents of the specified register are first 
decremented and then used to address the memory location containing the 
operand. 

Mode 5 - Autodecrement deferred - The contents of the specified register are 
first decremented and then used to address a word in memory containing 
the address of the operand in memory. 

Mode 6 - Indexed - The word following the instruction is fetched and added 
to the contents of the specified general register to form the address of 
the memory location containing the operand. 

Mode 7 - Indexed deferred - The word following the instruction is fetched and 
added to the contents of the specified general register to form the 
address of a word in memory containing the address of the operand in 
memory. 

The various addressing modes simplify the manipulation of diverse data structures such 
as stacks, tables, etc. When used with the program counter these modes enable 
immediate operands, absolute, and PC-relative addressing. The deferred modes permit 
indirect addressing. 

Autoincrement/autodecrement modes operate differently for byte and word 
instructions. When a byte is referenced, the increment/decrement is by 1. In 
references to words {including addresses in the deferred modes) the increment/ 
decrement is by 2. The use of R6 (SP) or R7 (PC) with these modes is an exceptional 
case. Since they generally must point to word addresses because of their use by the 
processor, R6 and R7 are always incremented/decremented by 2 and a word transfer 
made, even with byte instructions. 

The PDP-11 instruction set is made up of the following types of instructions: 

Single-operand instructions - A destination operand is fetched by the CPU, 
modified in accordance with the instruction, and then restored to the 
destination. 

Double- operand instructions - A source operand is fetched followed by the 
destination operand. The appropriate operation is performed on the two 
operands and the result restored to the destination. In a few double 
operand instruction such as exclusive OR (XOR), source mode 0 (register 
addressing) is implicit. 

Branch instructions - The condition specified by the instruction is checked, 
and if true, a branch is taken using a field contained in the instruction as a 
displacement from the current instruction address. 
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Jumps - Jump instructions allow sequential program flow to be altered either 
permanently (jump) or temporarily (jump to subroutine). 

Control, trap, and miscellaneous instructions - Various instructions are 
available for subroutine and interrupt returns, halts, etc. 

Floating-point instructions - A floating-point processor is available as an 
option with several PDP-11 CPUs. Floating-point implementation will not 
be considered in this paper. 

A summary of PDP-11 addressing modes, instruction set, and other programming 
information is given in Table 1. 

For the purposes of looking at the instruction execution cycle of the various 
PDP-11 processors, each cycle shall be broken into five distinct phases^: 

Fetch - This phase consists of fetching the current instruction from memory 
and interpreting its opcode. 

Source - This phase entails fetching the source operand for double operand 
instructions from memory or a general register and loading it into the 
appropriate register in the data paths in preparation for the execute 
phase. 

Destination - This phase is used to get the destination operand for single and 
double operand instructions into the data paths for manipulation in the 
execute phase. For JMP and JSR instructions the jump address is 

calculated. 

Execute - During this phase the operation specified by the current instruction 
is performed and any result rewritten into the destination. 

Service - This phase is only entered between execution of the last instruction 
and fetch of the next to grant a pending bus request, acknowledge an 
interrupt, or enter console mode after the execution of a HALT instruction 
or activation of the console halt key. 

The transitions from phase to phase are indicated in Figure 4. 


3 N.B.: The names are identical to those used by DEC to refer to instruction phases; 
however, their application here to a state within a given machine may differ from 
DEC’S since the attempt here is to make the discussion consistent over all machines. 
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SINGLE OPERAND: OPR dst 


OP CODE 
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00 
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WORD FORMAT: 

5i* tj II 



S • i 5 


5 


StNAKT-OCTAC 

PEPPCSENTATlO* 


Mnemonic Op Code 

Instruction 

General 



dst Result N Z V C 






MO DC 

0 

CLR(B) 

■ 050DD 

clear 

0 

0 10 0 







C0M(8) 

« 051DD 

complement (l’s) 
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• * 0 1 

Mode 

Name 

Symbolic 

Description 


INCfB) 

■ 052DD 

increment 

d 4- 1 

• * • _ 







DEC(B) 

■ 053DO 

decrement 

d - 1 

# — 

0 

register 

R 
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■ 0540D 
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-d 

* * • • 

1 
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TST(B) 

■ 057DD 
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(ft) is adrs; (R) 4- (1 or 2) 






3 

auto-incr deferred 

@(R)4- 

(R) is adrs of adrs; (R) 4* 2 

Rotate I Shift 




4 

auto*decrement 

-(*) 

(R) - (1 or 2); (R) is adrs 






e 

auto-decr deferred 

<SMR) 

(ft) - 2 

(R) is adrs of adrs 

ROR(B) 

■ 060DD 

rotate right 

-*C, d 

* * • • 
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7 
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■ 062DD 
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arith shift left 

2 d 
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ADC(B) 

■ 055DO 

add carry 

d 4- C 

• * * * 

2 

immediate 

~n 

operand n follows instr 


SBC(B) 

▲SXT 

• 05600 
006700 

subtract carry 
sign extend 

d-C 
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• • « « 
- * o - 

3 

absolute 


address A follows instr 
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LEGEND: 


Op Codes 


Operations 


DOUBLE OPERAND: OPR src, dst OPR src, R or OPR R, dst 


e =r 0 for word /1 for byte 
SS = source field (6 bits) 

DO = destination field (6 bits) 

R = gen register (3 bits). 0 to 7 
XXX = offset (8 bits). 4*127 to -128 
N = number (3 bits) 

NN ss number (6 bits) 


( )= contents of 

s = contents of source 

d = contents of destination 

r ss contents of register 

*- = becomes 

X = relative address 
% =s register definition 


IS 12 

11 

6 

5 


0 

op coot 

l_ ? : . i 

1 °f . J 

is 


9 8 6 

5 


0 

op cooc 

— L 1 „ A.. 

1 ' " 

SS on 00 

i 


Sesleao 


Condition Codes 


A ss AND 
V ss inclusive OR 
*Y= exclusive OR 
— ss NOT 


• -s conditionally set/cleared 
- ss not affected 

0 = cleared 

1 ss set 


NOTE: 

▲ = Applies to the 11/35, 11/40, 11/45 & 11/70 computers 
• ss Applies to the 11/45 & 11/70 computers 


Mnemonic 

Op Coot 

Instruction 

Operation 

n 

Z V € 

Central 

MOV(B) 

• 1SSDD 

move 

d «-s 

* 

* 0 - 

CMP(B) 

« 2SS00 

compare 

s -d 

• 

* • • 

ADO 

06SS0D 

add 

d «-s 4- d 

* 

> * • 

SUB 

16SSD0 

subtract 

d «-d - s 

e 

» • © 

logical 

BiT(B) 

* 3SSD0 

bit test (AND) 

s a d 

• 

* 0 - 

81C(8) 

a 4SSD0 

bit clear 

d «- { — s) a d 

• 

* 0 - 

BIS(B) 

■ 5SS00 

bit set (OR) 

d 4 - s v d 

« 

• 0 - 

▲Register 

MUl 

070RSS 

multiply 

r 4 -rxs 

« 

* 0 * 

DIV 

071RSS 

divide 

r *-r/s 

* 

* * • 

ASH 

072RSS 

shift arithmetically 

* 

* * * 

ASHC 

073RSS 

arith shift combined 


« 

• • • 

XOR 

074R0D 

exclusive OR 

d 4-r^v-d 

* 

• 0 - 


(Table courtesy of Digital Equipment Corporation) 


Table 1: PDP-11 Programming Summary 
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nz 

BA$£ CODE 
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Op Code = 

Base Code + XXX 



Mnemonic 

Base Code 

instruction 

Branch Condition 

Branches 

BR 

000400 

branch (unconditional) 

(always) 


BNE 

001000 

br if not equal (to 0) 

* 0 

2=0 

BEQ 

001400 

br if equal (to 0) 

= 0 

2=1 

BPL 

100000 

branch if plus 

+ 

N = 0 

BMt 

100400 

branch if minus 


N = 1 

BVC 

102000 

br if overflow is clear 


V = 0 

BVS 

102400 

br if overflow is set 


V = 1 

BCC 

103000 

br if carry is dear 


C = 0 

BCS 

103400 

br if carry is set 


C = 1 


Signed Conditional Branches 


BGE 

002000 

br if greater or eq (to 0} 

^0 

Nw-v = o 

BIT 

002400 

br if less than (0) 

<0 

Nw>V = 1 

BGT 

003000 

br if greater than (0) 

>0 

2 v (N ¥ V) = 0 

BLE 

003400 

br if less or equal (to 0) 

^0 

2 v(N-eY)= 1 

Unsigned Conditional 

Branches 



BHI 

101000 

branch if higher 

> 

C VZ = 0 

BLOS 

101400 

branch if lower or same 


C v2= 1 

BHIS 

103000 

branch if higher or same 


C = 0 

BLO 

103400 

branch if lower 

< 

C = 1 


MISCELLANEOUS: 


Mnemonic 

Op Code 

Instruction 

HALT 

WAIT 

RESET 

NOP 

000000 

000001 

000005 

000240 

halt 

wait for interrupt 
reset external bus 
(no operation) 

• SPl 
amfpi 
amtpi 

• MFPD 

• MTPD 

00023N 

Q065SS 

0066DD 

1065SS 

1066DD 

set priority level (to N) 
move from previous instr space 
move to previous instr space 
move from previous data space 
move to previous data space 


CONDITION 

CODE OPERATORS: 

5 

« 3 
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0 

! , , 

C*» COO£ BASE 

t 

■OOOZ40 * I 

i l f 1 

111 

a 

□ 

□ 




[_ CKCLEAB SELECTED CONO COCC B»TS 




1 *S£T SELECTED GONO COOE BITS 

Mnemonic 

Op Code 

instruction 


N 

2 V C 

CLD 

000241 

clear C 



- - o 

CLV 

000242 

clear V 


— 

• 0 

1 - 

CLZ 

000244 

clear 2 


— 

0 - 

, — 

CLN 

000250 

clear N 


0 

_ - 

. — 

CCC 

000257 

clear alt cc bits 


0 

0 0 0 

SEC 

000261 

setC 


* 

— - 

• 1 

SEV 

000262 

set V 


— 

- 1 

* 

SEZ 

000264 

set t 


— 

1 - - 

SEN 

000270 

set N 


1 

— — - 

see 

000277 

set ail cc bits 


1 

1 1 

1 


JUMP & SUBROUTINE; 


Mnemonic 

Op Coif 

instruction 

Notes 

JMP 

0001 DO 

jump 

PC ♦- dst 

J$R 

004RDO 

jump to subroutine ) 

I tee e^ma ft 

RTS 

00020R 

return from subroutine j 

wiUB r\ 

amark 

0064 NN 

mark 

aid in subr return 

ASOS 

077RNN 

subtract 1 & br (if zfc 0) 

(R) — 1. then if (R) 
PC 4- Updated PC - 
(2 x NN) 


TRAP & INTERRUPT: 


Mnemonic 

Op Code 

Instruction 

Notes 

EMT 

104000 
to 104377 

emulator trap 
(not for general use) 

PC at 30, PS at 22 

TRAP 

104400 
to 104777 

trap 

PC at 34, PS at 36 

BPT 

000003 

breakpoint trap 

PC at 14, PS at 16 

IOT 

000004 

input/output trap 

PC at 20, PS at 22 

RTI 

000002 

return from interrupt 


ARTT 

000006 

return from interrupt 

inhibit T bit trap 


PROCESSOR REGISTER ADORESSES: 


Processor Status Word 
PS -777 776 


«5 13 tl to 


00*K£*NCt* 0»*SU*€*viS0«* 




CA*m 

QvCRfLOW 

ZE«0 

-NEGATtvE 

-thacI Trap 
-GE* BEG SET* 

—we v»ous wqoc • 

-Cu#»RE*? MOOC * 


AStack Limit Register — 777 774 
• Program interrupt Request — 777 772 


Genera] Registers 
(console use only) 

(not for 11/45) 


RO — 777 700 
R1 — 777 701 
R2 — 777 702 
R3 — 777 703 


Console Switches £ Display Register — 777 570 


R4 — 777 704 
R5 — 777 705 
R6 — 777 706 
R7 — 777 707 


Table 1 (continued): PDP-11 Programming Summary 
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Q Fetch 

i 




Q Source 

{ 

>1 - - 




£ Destination 

s 




Q Execute 

s 



Service 

3 

L_ 


Skip source phase 
if instruction does not 
use a source operand 


Skip destination phase 
if instruction does not 
use a destination operand 


Skip service phase 
if there is no serviceable 
condition and processor 
is in run state 


Figure 4: PDP-1 1 Instruction Interpretation Cycle 


2.3 The UNIBUS 

All communication among the components of a PDP-11 system takes place on a 
set of bidirectional lines referred to collectively as the UNIBUS. The LSI-11 is an 
exception and uses an adaptation of the UNIBUS as explained in Section 4. The 
UNIBUS lines carry address, data, and control signals to all memories and peripherals 
attached to the CPU. Transactions on the UNIBUS are asynchronous with the 
processor. At any given time there will be one device which is bus master. The bus 
master may initiate communication with any device which it addresses, the addressed 
device becoming the bus slave. This communication may consist of data transfers or, in 
the case of the processor being slave, an interrupt request. The data transfers which 
may be initiated by the master are: 

DATO - Data out - A word is transferred from master to slave. 

DATOB - Data out, byte - A byte is transferred from master to slave. 


DATI - Data in - A word is transferred from slave to master. 




10 


DATIP - Data in, pause - A word is transferred from slave to master and the 
slave awaits a transfer from master back to slave to replace the 
information that was read. The UNIBUS control allows no other data 
transfer to intervene between the read and the write cycles. This makes 
possible the reading and alteration of a memory location as an indivisible 
operation. In addition it permits the use of a read/modify/write cycle 
with core memories in place of the longer sequence of a read cycle 
followed by a write cycle. 


3. Implementation of Medium-Performance PDP-Ils 

The broad middle range of PDP-lls have comparable implementations yet their 
performances vary by a factor of two. The processors making up this group are the 
PDP-11/04, 11/10, 11/20, 11/34, 11/40, and 11/60. This section discusses the 
features common to these implementations and the variations found between machines 
which provide the dimensions along which they may be characterized. 


3.1 Common Implementation Features 

All PDP-11 implementations, be they iow-, medium-, or high-performance, can be 
decomposed into a set of data paths and a control unit. The data paths store and 
operate upon byte and word data and interface to the UNIBUS permitting them to read 
from and write to memory and peripheral devices. The control unit provides all the 
signals necessary to evoke the appropriate operations in the data paths and UNIBUS 
interface. Midrange PDP-lis have comparable data path and control unit 
implementations allowing, them to be contrasted in a uniform way. In this section a 
basis for comparing these machines shall be established and used to characterize them. 


3.1.1 Data Paths 

An archetype may be constructed from which the data paths of all midrange 
PDP-lls differ but minimally. This archetype is diagrammed in Figure 5. All major 
registers and processing elements as well as the links and switches which interconnect 
them are indicated. The data path illustrations for individual implementations are 
grouped with Figure 5 at the end of the paper. These figures are laid out in a 
common format to encourage comparison. Note that with very few exceptions, all data 
paths are 16 bits wide (PDP-11 word size). 

The heart of the data paths is the arithmetic/logic unit or ALU through which all 
data circulates and where most of the processing actually takes place. Among the 
operations performed by the ALU are addition, subtraction, ones and twos 
complementation, and logical ANDing and ORing. 

The inputs to the ALU are the A leg and the B leg. The A leg is normally fed 
from a multiplexor (Aleg MUX) which may select from an operand supplied it from the 
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scratchpad memory (SPM) and possibly from a small set of constants and/or the 
processor status register (PS). The B leg also is typically fed from its own MUX (Bleg 
MUX), its selections being among the B register and certain constants. In addition the 
Bleg MUX may be configured so that byte selection, sign extension, and other functions 
may be performed on the operand which it supplies to the ALU. 

Following the ALU is a multiplexor (the AMUX) typically used to selects between 
the output of the ALU, the data lines of the UNIBUS, and certain constants. The output 
of the AMUX provides the only feedback path in all midrange PDP-11 implementations 
except the 11/60 and acts as an input to all major processor registers. 

The internal registers lie at the beginning of the data paths. The instruction 
register (IR) contains the current instruction. The bus address register (BA) holds the 
address placed on the UNIBUS by the processor. The program status register (PS) 
contains the processor priority, memo'ry-management-unit modes, condition code flags, 
and instruction trace trap enable bit. The scratchpad memory (SPM) is an array of 
sixteen individually addressable registers which include the general registers (RO-R7) 
plus a number of internal registers not accessible to the programmer. The B register 
(Breg) is used to hold the B leg operand supplied to the ALU. 

The variations from this archetype are minor as will be seen in Subsection 
3.2. Variations to be encountered include routings for bus address and 
processor status register, the point of generation for certain constants, the positioning 
of the byte swapper, sign extender, and rotate/shift logic, and the use of of certain 
auxiliary registers present in some designs and not others. In general these variations 
are all peripheral to the major elements and interconnections of the data paths. 


3.1.2 Control Unit 

The control unit for ail PDP-11 processors (with the exception of the 
PDP- 11/20) is microprogrammed [Wilk53]. The considerations leading to the use of 
this style of control implementation in the PDP-11 are discussed in [0lou75]. The 
major advantage of microprogramming is flexibility in the derivation of control signals 

to gate register transfers, synchronize with UNIBUS logic, control microcycle timing, 
and evoke changes in control flow. The way in which a microprogrammed control unit 
accomplishes all of these actions impacts performance. 

Figure 6 represents the archetypical PDP-11 microprogrammed control unit. 
The contents of the microaddress register determine the current control unit state and 
are used to access the next microinstruction word from the control store. Pulses from 
the clock generator strobe the microword and microaddress registers loading them 
with the next microword and next microaddress respectively. Repeated clock pulses 
thus cause the control unit to sequence through a series of states. The period spent 
by the control unit in one state is called a microcycle (or simply cycle when this does 
not lead to confusion with memory or instruction cycles) and the duration of the state 
as determined by the clock is known as the cycle time. The microword register 
shortens cycle time by allowing the next microword to be fetched from the control 
store while the current microword is being used. 
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Most of the fields of the microword supply signals for conditioning and clocking 
the data paths. Many of the fields act directly or with a small amount of decoding, 
supplying their signals to multiplexors and registers to select routings for data and to 
enable registers to shift, increment, or load on the master clock. Other fields are 
decoded based upon the state of the data paths. An instance of this is the use of 
auxiliary ALU control logic to generate function select signals for the ALU as a function 
of the instruction contained in the IR. Performance as determined by microcycle count 
is in large measure established by the connectivity of the data paths and the degree to 
which their functionality can be evoked by the data path control fields of the 
microprogram word. 

The complexity of the clock logic varies with each implementation. Typically the 
clock is fixed at a single period and duty cycle; however, processors such as the 11/34 
and 11/40 can select from two or three different clock periods for a given cycle 
depending upon a field in the microword register. This can significantly improve 
performance in machines where the longer cycles are necessary only infrequently. 
The clock logic must provide some means for synchronizing processor and UNIBUS 
operation since the two operate asynchronously with respect to one another. Two 
alternate approaches are employed in midrange implementations. Interlocked 
operation, the simpler approach, shuts off the processor clock, when a UNIBUS 
operation is initiated and turns it back on when the operation is complete. This 
effectively keeps microprogram flow and UNIBUS operation in lockstep with no overlap. 
Overlapped operation is a somewhat more involved approach which continues 
processor clocking after a DATI or DATIP is initiated. The microinstruction requiring 
the result of the operation has a function bit set which turns off the processor clock 
until the result is available. This approach makes it possible for the processor to 
continue running for several microcycles while a data transfer is being performed, 
improving performance. 

The sequence of states through which the control unit passes would be fixed if 
not for the branch-on-microtest (BUT) logic. This logic generates a modifier based 
upon the current state of the data paths and UNIBUS interface (contents of the 
instruction register, current bus requests, etc.) and a BUT field in the microword 
currently being accessed from the control store which selects the condition on which 
the branch is to be based. The modifier (which will be zero in the case that no branch 
is selected or that the condition is false) is ORed in with the next microinstruction 
address so that the next control unit state is not only a function of the current state 
but also a function of the state of the data paths as well. Instruction decoding and 
addressing mode decoding are two prime examples of the application of BUTs. Certain 
code points in the BUT field do not select branch conditions, but rather provide control 
signals to the data paths, UNIBUS interface, or the control unit itself. These are known 
as active or working BUTs. 

The JAM logic is a part of the microprogram flow-altering mechanism. This logic 
forces the microaddress register to a known state in the event of an exceptional 
condition such as a memory access error (bus timeout, stack overflow, parity error, 
etc.) or power up by ORing all ones into the next microaddress through the BUT logic. 
A microroutine beginning at the all-ones address handles these trapped conditions. 
The old microaddress is not saved (an exception to this occurs in the case of the 
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PDP-11/60); consequently, the interrupted microprogram sequence is lost and the 
microtrap ends by restarting the instruction interpretation cycle with the fetch phase. 

The structure of the microprogram is determined largely by the BUTs available 
to implement it and by the degree to which special cases in the instruction set are 
exploited by these BUTs. This may have a measurable influence on performance as in 
the case of instruction decoding. The fetch phase of the instruction cycle is concluded 
by a BUT that branches to the appropriate point in the microcode based upon the 
contents of the instruction register. This branch can be quite complex since it is based 
upon source mode for double operand instructions, destination mode for single operand 
instructions, and opcode for ail other types of instructions. Some processors can 
perform the execute phase of certain instructions like set/clear condition code during 
the last cycle of the fetch phase meaning that the fetch or service phases for the next 
instruction might also be entered from BUT 1RDEC00E. Complicating the situation is the 
large number of possibilities for each phase. For instance, there are not only eight 
different destination addressing modes, but also subcases for each that vary for byte 
and word and for memory modifying, memory non-modifying, MOV, and JMP/JSR 
instructions. 

Some PDP-11 implementations such as the 11/10 make as much use of common 
microcode as possible to reduce the number of control states. This allows much of the 
IR decoding to be deferred until some time into a microroutine which might handle a 
number of different cases, for instance, byte and word operand addressing is done by 
the same microroutine in a number of PDP-lls. With the cost of control states 
dropping with the cost of control store ROM, there has been a trend toward providing 
separate microroutines optimized for each special case as in the 11/60. Thus more 
special cases must be broken out at the BUT 1RDEC0DE making the logic to implement 
this BUT increasingly involved. There is a payoff, though, because there is a smaller 
number of control states for IR decoding and fewer BUTs. Performance is boosted as 
well since frequently occurring special cases such as MOV register to destination can 
be optimized. 


3.1.3 Typical Instruction Interpretation Cycle 

To get a feel for the PDP-11 data paths and control unit in operation, consider 
the interpretation of a representative instruction by the archetypical PDP-11. The 
instruction to be followed is a word bit set (BIS), an instruction which takes its source 
operand, logically ORs it with the destination operand, and returns the result to the 
destination. Register addressing with register 2 is used for the source, indexed 
addressing with register 7 used for the destination. This means that general register 2 
will supply the source operand; the destination operand is in a memory location with 
address calculated by adding the contents of register 7 to the contents of the memory 
location following the instruction. Since register 7 is the program counter, the index 
following the instruction is effectively a displacement from the instruction to the 
destination operand. 

What follows is the sequence of microinstructions evoked during the execution 
of the macroinstruction described above. Each microinstruction is numbered and 
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consists of the register transfers and any UNIBUS operation or branch-on-microtcst 
initiated by the microword. 


Notation used in microinstructions: 


B 

BA 

BUSDATA 

CLKQFF 

IR 

PC 

RD 

RS 

SRCOPR 
a OP b 


a <- b 


= B register 
* bus address register 
= UNIBUS data lines 

= stop the processor clock until a UNIBUS transaction is completed, 
used for processor/UNIBUS overlap 
= instruction register 
= program counter (scratchpad register 7) 

= scratchpad register addressed by macroinstruction destination field 
(IR<2:0>) 

= scratchpad register addressed by macroinstruction source field 
(IR<8:6>) 

= scratchpad register 10 (not accessible to programmer), used as a 
temporary for source operands 

= operand a (on the A leg of the ALU) and operand b (on the B leg of 
the ALU) are combined according to the operation specified by the 
macroinstruction. The ALU function is selected by the auxiliary ALU 
logic as described in (3.1.2). 

= register a is loaded with operand b 


Phase 

Cycle 

Operation 

FETCH 

1 

BA - PC; 

DAT I; CLKOFF 


2 

IR <- BUSDATA 


3 

PC ♦- PC+2; 
BUT I RDECODE 


Explanation 

A read operation is initiated to fetch 
the instruction addressed by the 
program counter. 

The instruction is placed in the 
instruction register. 

The program counter is incremented 
to address the next location in the 
instruction stream (in this case the 
location containing the index for the 
destination). The instruction (held in 
the IR) is decoded by the BUT and 
found to be a double-operand 
instruction causing a branch to the 
microcode for source mode 0. 
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SOURCE 


DESTINATION 


SRCOPR - RS; 

BUT DESTINATION 


BA - PC; 
DAT I 


PC «- PC+2; 
CLKOFF 


B «- BUSOATA 


BA <- RD+B; 
DATIP; CLKOFF 


B - BUSOATA 


The contents of the register 
addressed by the source field of the 
instruction (register 2) are copied 
into the scratchpad. register 
reserved for source operands. The 
next state is determined by the 
destination addressing mode and the 
fact that BIS is a word instruction 
which modifies its destination. 

A read operation is initiated to get 
the index word (pointed to currently 
by the program counter) for the 
effective address of the destination 
operand. 

The program counter is incremented 
to point to the next instruction. 
Note that this cycie is overlapped 
with the DATI started in cycle 5. 

The index is stored for use in the 
next cycle. 

The index is added to the contents 
of the destination register to form 
the effective address of the 
destination operand. A DATIP is 
performed to read the operand 
since the operand is to be modified 
and then restored to its original 
location in memory. 

The destination operand is stored so 
it is available to the B leg of the 
ALU. 
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EXECUTE 10 BUSDATA «- SRCOPR OP B; The source and destination operands 
DATO; CLKOFF; are logically ORed together and put 

BUT SERVICE out on the UNIBUS to be be 

rewritten into the memory location 
from which the destination operand 
was read. (Note that the destination 
address is still in BA.) Upon 
completion of the DATO, the control 
unit will branch into the service 
phase if a serviceable condition is 
pending, otherwise it will branch 
back to repeat the fetch phase for 
the next instruction. Although it 
performs an execute phase function, 
this microinstruction is part of the 
same destination mode microroutine 
that generated cycles 5 through 9. 


At a detailed level, the instruction interpretation process of each PDP-11 
implementation will vary significantly from that outlined above; however, the scenario 
is still highly representative of the operation of the control unit and data paths in the 
designs to be considered. 


3.2 Characterization of Individual Implementations 

A set of common implementation features may be used to characterize each 
midrange PDP-11 to provide the raw data upon which comparisons may be based. A 
summary of these characteristics is given in Tables 2 and 3. 


3.2.1 P DP- 11/20 

The 11/20 was the original member of the PDP-11 family. The 11/20 is atypical 
in a number of important aspects. Because the semiconductor read-only memory 
technology which makes microprogramming economically attractive was largely 
undeveloped when the PDP-11/20 was designed, control was implemented in random 
logic in contrast to the microprogrammed control used in all the succeeding members of 
the PDP-11 family. This causes control to be forced into a very stylized form so as to 
minimize the number of control unit states. Finally, the UNIBUS control generates a 
number of signals controlling the operation of the data paths. This makes it necessary 
for the UNIBUS and processor control unit to operate in tight lockstep with each other 
with no possibility of asynchronous data transfer. 

The absence of MSI also has significant impact on the implementation of the data 
paths (Figures 7 and 8). The extensive use of SSI logic has several ramifications 
beyond increased cost and complexity. The Aleg and Bleg MUXes are set up to act as 
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latches in addition to acting as data selectors (Figure 8). One may think of a Breg 
being placed between the Bleg MUX and the ALU. The ALU is a simple adder in 
contrast to the multifunctioned TTL MSI 74181 ALUs used in every other medium- 
performance PDP-11. Logical operations are carried out in the Aleg MUX/latch. The 
MUX can select either the true or complemented form of operands to support logical 
NOT. Logical OR is accomplished by gating the two operands into the MUX 
simultaneously (one operand may have been latched beforehand). Logical AND is 
performed by making use of DeMorgan’s Rule (AaB * ~[~AV'>'B]). Since there is no 
logic for complementing the output of the Aleg MUX/latch, two cycles are necessary: 
the first to form ~Av~B, the second to run it through the Aleg MUX again to form the 
complement. The rotate/shift/byte swap logic is built into the MUX following the 
adder. A final peculiarity of the 11/20 is the separate paths provided from the 
UNIBUS for the IR and PS. Interestingly enough, even with all of these rather striking 
differences in implementation, the PDP-11/20 still shows a strong kinship to its 
successors. 


3.2.2 P DP- 11/40 

The PDP-11/40 was designed to improve upon the performance of the 
PDP-11/20 without an increase in price by taking advantage of the TTL MSI 
technology arising after the introduction of the 11/20. With the exception of the 
PDP-11/60 (and the 11/20 which exceeds the 11/40 in cost), the 11/40 is both the 
fastest and most expensive midrange PDP-11 processor. 

The data paths of the 11/40 (Figure 9) correspond closely to those of the 
archetype except in the immediate vicinity of the ALU. What has been indicated as the 
Aleg MUX is really the negative-logic wired OR of a number of signals. Options such as 
the floating-point processor are added by simply tying them into the DMUX output and 
Aleg. Two paths exist out of the PS: one running to the Aleg MUX as in the archetype 
and a second running directly to the UNIBUS as in the 11 / 20 . A path from the Aleg 
MUX directly to the DMUX (equivalent to the AMUX of other models) exists allowing the 
ALU (and thus the propagation delay incurred by passing through it) to be bypassed in 
those cases where the contents of the SPM or PS are to be routed directly back to the 
Breg or SPM. Single-bit shifts and rotates right are handled in the DMUX in a fashion 
similar to the 11/20. Rotate/shifts to the left, however, are performed in the ALU. 
Sign extension and byte swapping are performed in the Bleg MUX. Since the 
scratchpad register may not be both simultaneously read and written, the D register 
(Dreg) is used to hold results generated while the SPM is being read in one processor 
clock phase so that during a later phase they may be written back into the scratchpad. 
In this way the Dreg permits read/write access of the SPM within a single cycle. A 
final feature is the presence of two paths into the bus address register, one from the 
Aleg MUX and one from the ALU. This is of benefit in such operations as 
autoincrement and autodecrement addressing modes in which the contents of a register 
can be modified and either the premodification (autoincrement) or postmodification 
(autodecrement) value of the the register put into the bus address register in a single 
cycle. 


The 11/40 microprogrammed control unit is quite elaborate to gain full benefit 
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of the potential of the data paths. Among its features are overlapped processor/ 
UNIBUS operation and three selectable microcycle clock periods. The latter feature 
increases performance immensely since the maximum cycle time of 300 nanoseconds is 
needed only when a full circle from scratchpad through ALU and back to scratchpad is 
made. In cycles which do not write into the scratchpad, a 200 nanosecond cycle may 
be selected. When the data paths are unused and only microbranching is involved, an 
even shorter cycle time of 140 nanoseconds is possible. A final unique feature of the 
11/40 is a variation in the branch-on-microtest logic from that of the archetypical 
control unit. To increase rnicrobranch speed, the microword BUT select field is 
buffered in the microword register rather than being routed directly from the control 
store to the BUT logic. This causes a one-cycle delay in processing the branch and 
forces all BUTs to be placed one microinstruction ahead of where they are to take 
effect. In some cases dummy steps are required to provide sufficient lead time for 
BUT action to occur, somewhat offsetting the speedup of this arrangement. 

One way in which the 11/40 uses its processor/UNIBUS overlap feature to 
advantage is by prefetching words from memory whenever possible. At the end of the 
fetch phase, a check is made to see if the next memory reference fetches an 
instruction or operand index. If it does, the read access is begun immediately using 
the contents of the PC as the address. Exceptions to this are when the PC is used as 
a destination or when a service request is pending, both of which mean that the 
current value of the PC won’t be the address of the next instruction. Starting the 
access eariy allows it to proceed in parallel with the execution of the current 
instruction. This reduces the time the processor idles waiting for the accessed word. 
Updating of the PC is deferred until the proper point in the instruction interpretation 
process is reached. This guarantees that references to the PC will result in the 
proper value being used. 


3.2.3 PDP-11/ 10 

The PDP-11/10 was designed as a minimal-cost processor. The implementation 
is again TTL MSI but stripped to the bare essentials without the elaboration of the 
11/40. The data paths of the 11/10 (Figure 10) follow the conventions of the 
archetype closely. A constant zero may be selected onto the AMUX in addition to ALU 
or UNIBUS data. The ALU Aleg multiplexor allows selection of the PS, some constants, 
and some internal addresses as well as the scratchpad memory. The Breg is 
implemented as a universal bidirectional shift register so that single-bit shifts and 
rotates may be performed without additional logic. The ALU Bleg multiplexor includes 
the constants one and zero and permits sign extension of the low-order byte of the B 
register. The scratchpad memory may not be both read and written in the same cycle, 
thus operations such as incrementing the PC which takes only a single microcycle on 
other processors take two microcycles to complete on the 11/10. A byte swapping 
path is absent in the 11/10. As a consequence odd-byte addressing and swapping 
must be accomplished by a series of eight shifts or rotates. 

The 11/10 control unit has a relatively austere implementation. There is no 
microword register in the control unit although there is necessarily a microaddress 
register. As a consequence, the output of the control store is used directly to 
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condition the data paths. This precludes the overlap of current microinstruction 
execution with next microinstruction fetch. Hence, the propagation delay of the control 
store must be added to that of the data paths in setting the microcycle time, causing it 
to be a relatively long 300 nanoseconds. The simplicity of the data paths allows the 
use of a microword only 40 bits wide. The microcode contains very few frills and 
gains very little in performance from special cases. A notable example of this is the 
jump address calculation for JMP and JSR instructions. The 11/10 uses the same 
section of microcode for JMP and JSR destination modes as it uses to fetch 
conventional destination operands. This costs an extra memory reference over the 
separate microroutines used in other PDP-11 processors since not only is the effective 
address of the jump calculated, but also its contents are fetched (the microprogram 
logic precludes using this operand as a prefetched instruction even though this is 
effectively what it is). Overlapped processor /UNIBUS operation allows some of the 
extra microcycles necessitated by the data paths to be effectively hidden by putting 
them in parallel with UNIBUS accesses. The other concession to performance is clock 
speed doubling during shift operations to partially compensate for the performance 
lost in the absence of a byte swapper. 


3.2.4 P DP- 11/ 04 

The PDP-11/04 is the simplest PDP-11 except for the LSI-11. Although simple, 
the 11/04 embodies a very good set of design tradeoffs. Figure 11 diagrams the 
11/04 data paths. The scratchpad memory has a register (SPreg, part of the SPM 
shown in Figure 11) sitting between it and the AMUX. This register allows the 
scratchpad to support read/modify/write accesses saving a microcycle in each such 
access over the 11/10. A multiplexor sitting before the SPM implements the swap 
byte operation, allowing the halves of a word to be interchanged. This improves byte 
operation performance considerably over the 11/10 and obviates the need for the 
11/10’s fast shift logic. Also eliminated is overlapped processor/UNIBUS operation 
since the savings from it are reduced with the overall reduction in number of 
microcycles. 

The AMUX (the major data bus and the multiplexor which drives it) can select 
the PS and a number of constants in addition to ALU output and UNIBUS data. 
Between the SPM and ALU is a ones complementer so that the 74181 ALU may be used 
to perform the Bleg minus Aleg operation used in the subtract instruction in addition to 
the Aleg minus Bleg operation used in the compare instruction. The Aleg MUX also 
directly drives the UNIBUS address lines without a bus address register (if processor/ 
UNIBUS overlap were used, a BA register would have been necessary). Between the 
Breg and ALU is a multiplexor which allows the Breg, sign-extended low-order byte of 
the Breg, or the constants zero or one to be selected into the Bleg of the ALU in a 
manner identical to that of the Bleg MUX of the 11/10. The Breg is also identical to 
that of the 11/10 in that it is a bidirectional shift register implementing rotate/shifts. 

The final contributor to increased performance of the 11/04 is the decrease in 
cycle time from 300 nanoseconds in the 11/10 to 260 nanoseconds, made possible in 
part by pipelining the microword fetch. On the whole, the 11/04 is superior in 
performance to the 11/10 in all cases except the fetch phase and certain addressing 
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modes where the use of its processor/UNIBUS overlap capability is sufficient to put 
the 11/10 ahead. 


3.2.5 P DP- 11/34 

The PDP-11/34 is an elaboration of the 11/04. The 11/34 data paths (Figure 
12) bear close resemblance to those of the 11/04. The 11/04 complementer has been 
replaced in the 11/34 by additional microcode which reverses the placement of source 
and destination operands on the A and B legs of the ALU during the subtract 
instruction from that of the other double operand instructions. This frees the 11/34 
from performing the adjustments that must be made in the data paths of other PDP-11 
processors to make the subtract instruction operate correctly under the restrictions of 
the 74181 ALU. Added is a B extension register (BXreg) which, when concatenated 
with the Breg, forms a 32-bit register for double-width operands and results 
manipulated by extended instruction set operations such as multiply and divide. Also 
notable is the relocation of the byte swapper to the tail of the AMUX allowing odd- 
byte accessing to occur as data is entered from or placed upon the UNIBUS without 
the customary extra microcycle needed in other implementations to right adjust the 
byte. , Included with the byte swapper is the sign extension logic. Schottky TTL is 
used in critical places in the data paths, notably the ALU, to speed up microcycle time 
from the 260 nsec of the 11/04 to 180 nsec. Additional hardware for memory 
management (not shown in Figure 12) and extended instruction set microcode are 
standard features. 

The 11/34 microprogrammed control unit makes some concessions to the 
improved performance of the data paths. In addition to the normal 180 nanosecond 
cycle, there is a 240 nanosecond cycle used primarily for UNIBUS operations. Again, 
there is no processor/UNIBUS overlap feature because considerations of simplicity (i.e. 
cost) outweighed the incremental improvement in performance that would be netted. 
Because of its additional logic, the PDP-11/34 has a wider microword than the 11/04 
(48 bits versus 40 bits). Also, since many more cases are broken out by the BUT 
IRDECODE in the 11/34 than in the machines preceding it, the size of the control store 
has been increased to 512 words, double that of earlier horizontally microprogrammed 
implementations. 


3.2.6 PDP-11/60 

The PDP-11/60 is the latest implementation covered in this paper and in many 
ways the most unique. Its design exploits advances in circuit technology occurring 
since the introduction of the earlier models giving it a number of features which set it 
apart from other PDP-11 family members. Two major enhancements are a larger 
microcode addressing space, making an integral floating-point instruction set and a 
writable control store option feasible, and a cache memory^. Both are possible due to 
increases in the density and decreases in the cost of bipolar ROM and RAM [Mudg77]. 


^ The PDP-11/70 also uses a cache. 
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As illustrated in Figure 13, the 11/60 data paths show significant differences 
from those of other midrange implementations. A major difference is the presence of 
three scratchpad memories feeding the ALU. Scratchpads A and B are 32-word-by- 
1 6 -bit register arrays, each having twice the number of registers of the single 
scratchpad found in other midrange designs. As with the 11/45 (Section 5), 
the contents of the general registers are kept in both scratchpads allowing different 
registers to be read onto the A and B legs of the ALU simultaneously within the same 
cycle. This speeds register-to-register operations. The additional registers in the A 
and B scratchpads are used as floating-point registers by the integral floating-point 
microcode, working storage by user microprograms, and console, maintenance, and 
status registers by the processor. Scratchpad C is a 16-word-by-l 6-bit array which 
holds bus data and constants used by the processor and takes the place of the 
constants ROM on the B leg of other midrange implementations. During exceptional 
situations these constants may be overwritten with other information but must be 
restored before execution of the base machine microcode may be resumed. 

The 11/60 is the first PDP-11 implementation to make use of three-state 
devices to eliminate many of the multiplexors used in other designs (the 11/40 uses 
open-collector logic on the Aleg bus to the same effect). For instance, instead of 
actual Aleg and Bleg MUXes, the 11/60 uses registers and combinatorial elements with 
three-state outputs that can be independently enabled onto a common bus for each 
ALU leg. The ALU itself is the conventional ’181 type used in all of the other MSI 
implementations. As in the 11/40, the D register (Dreg) latches the ALU output so that 
results may be rewritten to the scratchpads during a later clock phase of the 
microcycle in which they are generated. The output of the Dreg is the major, but not 
sole, feedback route in the data paths. 

The bus address register (BA) is loaded from the Aleg bus as in the 11/04 and 
11/34. The address out bus is driven by the BA and supplies addresses to the 
memory subsystem (cache, relocation hardware, and UNIBUS interface). The bus data 
in (DIN) bus routes data into the processor from the memory subsystem, internal 
registers accessed via UNIBUS addresses such as the PS, and constants emitted by the 
microinstruction word. Scratchpad C and the instruction register are loaded directly 
from DIN in a manner reminiscent of the 11/20. A register in SPM C is set aside 
specifically for transfers from memory to the data paths. Results are routed from the 
data paths back to the memory subsystem and internal registers via a separate bus 
data out (DOUT) bus. 

As compared to the other midrange machines, several data path elements are 
unique to the 11/60. The counter (Cntr) is an iteration counter used by the extended 
instruction set and floating-point microcode. The shift register and shift register guard 
(shown together as the SR in Figure 13) can be loaded in parallel with Dreg and 
shifted one position right or left. Either all or the low-order seven bits of the SR may 
be gated onto the Aleg bus through the XMUX (not shown). The shift tree is a network 
of multiplexors used for byte swapping, sign extension, and field isolation and 
positioning. It is unusual in that it allows right shifts of from 1 to 14 bit positions 
combinatorially in a single microcycle. 

The PDP-11/60 control unit is horizontally microprogrammed in much the same 



manner as the other midrange implementations. Extensive use of Schottky logic 
throughout the processor allows a fixed 170 nanosecond microcycle time. Processor/ 
UNIBUS communication is interlocked unlike either the 11/40 or 11/45. There are 
several significant differences from the more conventional implementations. Many of 
these differences are generalizations of the microprogram flow control mechanism to 
allow more functions of the base machine to be performed by microcode rather than 
hardwired logic and to create a user microprogramming environment which can be put 
to uses beyond executing the PDP-11 instruction set. The 11/60 has a larger and 
more generalized set of BUTs than earlier machines. Also included for the first time in 
a horizontally microprogrammed machine is a multilevel microsubroutine cali/return 
capability. 

Increased reliance on microcode has expanded the control store to 4096 words 
by 48 bits. 2560 words of this are used to implement the basic machine. The 
remaining 1536 words are available to the user through a ROM control store option; 
1024 are available through a writable control store option. Since addressing the 
microstore requires 12 bits, a page-addressing scheme has been adopted to avoid 
widening the microword. Page size is 512 words reducing microaddresses to 9 bits 
within a page. Microbranches across a page boundary require that an additional 3-bit 
page field be specified. 

Another concept used extensively in the 11/60 to reduce microword size is 
residual control. In this technique relatively static control information is kept in set-up 
registers separately from the microword. The microprogram must load these registers 
to affect the data path elements which they control. Set-up registers are used in the 
11/60 to gate registers onto to DIN bus, enable data into registers from the DOUT bus, 
select SR functions, and control certain actions of the shift tree. 

The overlapping of a number of different control fields by bit steering is a final 
means of keeping the microword relatively narrow. Certain bits in the microword 
control the interpretation of corresponding microword fields. This allows a single field 
to control several different functions. The one drawback of this technique is that 
these functions become mutually exclusive within a single microword since their 
simultaneous use would involve two different interpretations of the same microfield. 

Hardwired logic in the memory subsystem detects internal addresses in a manner 
similar to other PDP-11 processors. However, the actual access to these registers is 
accomplished through microcode instead of additional control logic. Internal address 
access has been added to the exceptional conditions detected by the JAM logic of the 
11/60. If the JAM microroutine finds that a microtrap has been caused by an internal 
address access, then an intraprocessor transfer to or from the addressed register is 
performed. Unlike other JAM sequences, such transfers are terminated by resuming 
the interrupted microprogram. Microcoded register access requires much more time 
than the corresponding hardwired access. Reading the PS, for instance, takes 33 
microcycles or 5.610 microseconds using microcode where a single microcycle suffices 
for the hardwired approach. This is justified, however, by the decreased cost of 
microcode versus hardwired logic and by the infrequent access made to these 
registers. 
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Like the 11/40, the 11/60 prefetches instructions and operand indices whenever 
possible. Unlike the 11/40, the. PC is incremented at the time the prefetch is 
performed. Because of this, prefetching cannot be done when the current instruction 
uses the PC as either a source or destination register. A second difference is that 
service requests are not polled until the end of the current instruction, when the next 
instruction rnay already be prefetched and the PC updated. When this occurs, two 
microcycles must be spent to decrement the PC to restore its old value before 
proceeding with the service phase. 


4. Implementation of a Minimal-Cost PDP-11 

The LSI-11 (known in packaged form as the PDP-11/03) is designed for the low- 
end market where there is more concern for low cost than high performance. 
Integrated circuit package count and printed circuit board area, the main determinants 
of manufacturing cost, are kept low through an n-channel MOS LSI technology 
implementation of the CPU. The result is a PDP-11 processor with four kilowords of 
semiconductor memory on a single 8.5" x 10.5” (standard DEC quad height) printed 
circuit board which can execute the entire PDP-11/40 instruction set. 

The constraints imposed by current semiconductor technology dictate much of 
the implementation of the LSI-11. The entire CPU consists of four LSI packages plus a 
number of standard TTL SSI and MSI packages for clock generation and bus 
interfacing. A system control chip provides microinstruction addressing logic plus an 
interface to external signals used in bus control. A data paths chip contains the 
registers and arithmetic/logic unit of the machine. Two chips are microcode ROMs 
(MICRQMs). Each contains 512 microinstruction words with a width of 22 bits. An 
optional third MICROM adds the extended instruction set/floating-point instruction set 
option of the PDP-11/40. To decrease the complexity of the machine, the traditional 
UNIBUS was abandoned in favor of a scheme requiring fewer bus lines. Most notable 
is the multiplexing of both data and addresses onto a single set of 18 data/address 
lines, DAL<17:00>. A significant savings over the 34 lines dedicated to data and 
address in the UNIBUS results at the expense of bus cycle speed. 

The 22 bit microinstruction word of the LSI-11 is quite narrow compared to the 
microwords of the horizontally microprogrammed PDP-lls which range from 40 to 64 
bits wide. Four bits are not decoded and provide direct TTL-compatible signals which 
are used by logic external to the CPU chips. Another two bits are used within the CPU 
chips to control next microinstruction addressing. The remaining 16 bits are decoded 
as a microinstruction by the CPU chips. LSI-11 microinstructions differ little in form 
from conventional minicomputer instructions with their opcode and operand (which may 
be register, microcode address, or literal) fields. These require a great deal more 
decoding than the horizontal microinstructions of other designs. 

The LSI-11 microstore is larger than the control store of any other PDP-11 
except the 11/60. Since LSI-11 microinstructions lack the possibilities for parallelism 
inherent in the horizontal microinstructions, more LSI-11 microinstructions are needed 
to code a given operation. In addition, certain functions which are handled with 
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combinatorial logic in other PDP-11 control units and data paths are microcoded in the 
LSI— 1 1 . Finally, the LSI-11 has more elaborate console microcode than the other 
implementations. As a result, the LSI-11 has 22523 bits of microstore versus 14336 
bits for the PDP-11/40, 163S4 bits for the PDP-11/45, and 122880 bits for the 
PDP-11/60. The narrow microword is used in spite of its attendant problems due to a 
limitation imposed by the packaging of the MOS CPU chips. Only 40 pins are available 
to carry power and signals to and from each chip, limiting the number of lines available 
for transmitting the microword from the MICROMs to the control and data path chips. 

Technology also imposes a serious constraint on instruction decoding. The 
equivalent of a branch on microtest allows only 8 bits to be decoded at a time. This is 
sufficient for decoding the majority of instructions; however, the remainder require 
additional decoding which may consume as many as 8 microcycles. This is in marked 
contrast with all other PDP-lls which require only a single microcycle to do the initial 
instruction decode at the end of the fetch phase (BUT IRDECODE)^. The effect that 
this has on the average duration of the LSI-11 fetch phase is evident from Table 4. 

Figure 14 details the data paths around which the operands of the 
macroinstruction level machine circulate. As with the medium-performance 
implementations, the ALU is the hub of activity, operating upon quantities supplied 
from the scratchpad memory. The AMUX selects among the output of the ALU, the high 
or low byte of the data/address lines, and the processor flags. The selected quantity 
is fed back to be rewritten into the scratchpad. Constants supplied as literals from the 
microinstruction word may be gated into the data paths through the Bleg MUX to the 
ALU. Additional paths exist for transmitting information in and out on the data/address 
lines. 


Significant differences exist between the data paths of the LSI-11 and the 
midrange machines in addition to the similarities. One major difference is in the width 
of the data paths. The LSI-11 is the only member of the PDP-11 family with data 
paths 8 bits rather than 16 bits wide. This is necessitated by limitations in current 
semiconductor chip density. Bus paths in particular occupy large amounts of chip real 
estate dictating their reduction in width. Since only 8 bits of data can be processed at 
a time, two microcycles are required to accomplish any 16-bit operation. A second 
effect is the elimination of logic that would otherwise be necessary to configure the 
data paths for both byte and word operations. A last unique characteristic is the 
absence of a B register for feeding the 8 leg of the ALU. Instead, the B leg is fed 
from a second read port into the scratchpad memory. In this the LSI-11 bears a 
curious resemblance to the PDP-11/45 and 11/60. The difference is that while the 
LSI-11 uses this feature to eliminate cycles that would be needed to load a Breg, there 
is not sufficient logic to allow source and destination registers to be accessed 
simultaneously. Consequently, multiple cycles are still required to set up register/ 
register operations on the LSI-11. 

The final important performance factor is again a direct result of the circuit 
technology employed. NMOS logic is not as fast as the bipolar logic found in every 
other PDP-11 implementation so that the microcycle time of the LSI-11 is 400 
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The 11/60 requires two microcycles to decode certain instructions. 



27 


nanoseconds or one-third slower than the next slowest PDP-1 1. This coupled with the 
larger number of microcycles necessary to execute a given macroinstruction causes 
the LSI-11 to lag in performance. 


5. Implementation of a High-Performance PDP-11 

The PDP-11/45 was designed for maximum performance and followed the 11/20 
to become the second member of the PDP-11 family. Maximum performance is 
achieved with a complex set of data paths allowing highly parallel operation and an 
optional high-speed semiconductor memory (bipolar or MOS) with its own path into the 
processor called the Fastbus. The extensive use of Schottky TTL in the processor 
makes possible a 150 nanosecond cycle time, half as long as that in some midrange 
designs. 

The complexity of the PDP-1 1/45 data paths is evident from Figure 15 even 
with several of the special-purpose registers and buses omitted for clarity. The 
overall organization still bears some resemblance to the midrange PDP-11 data paths, 
however. The ALU remains the hub of data path activity with its output the primary 
feedback path to the processor registers, although not the only one as in other 
implementations. The ALU is based upon the Schottky equivalent of the 741S1 chip 
used in most other PDP-11 designs. The difference begins with the multiplexors 
driving the A and B legs of the ALU. These MUXes allow operands to be routed 
directly to the proper leg without using additional cycles to move operands from 
register to register. KOMUX and K1MUX (combined in Figure 15) are multiplexors used 
in conjunction with the BMUX to gate constants, trap vector addresses, and branch 
offstets into the B leg of the ALU. 

Among the registers supplying the AMUX and BMUX are the source and 
destination operand registers (Sreg and Dreg respectively). These are in turn supplied 
by the SRMUX and DRMUX which select data from individual scratchpad registers or the 
program counter. Besides holding operands from the general registers, the Sreg and 

Dreg act as working registers. In particular Dreg is a shift register used to accumulate 
the less significant half of results during multiply and divide. 

Separate scratchpads are maintained so that source and destination general 
registers may be read simultaneously and independently. This necessitates both 
scratchpads being written together to keep their contents identical. Each scratchpad is 
organized as 16 words of 16 bits each. Fifteen words in each scratchpad are actually 
used: two sets of general registers R0 through R5 and three sets of stack pointers 
(R6). Register set selection is controlled by status bits in the PS and permits fast 
context switching by eliminating the need to save and restore registers. 

The program counter is not maintained in the scratchpad registers as in other 
PDP-1 Is. Rather, it is held separately so that it may be routed directly to the 8AMUX 
while the Sreg and Dreg are occupied with other operations. Moreover, two program 
counters are implemented. PCB holds the current value of the program counter and is 
used as a general register or bus address. PCA holds the new value of the program 
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counter allowing the PC to be updated while the old PC value is still in use, after which 
PCS is clocked to load it with the new value contained in PCA. 

The SHFMUX can right shift or byte swap data from the ALU before it is clocked 
into the scratchpads. It also provides a route from PCS to the Sreg and/or Dreg when 
the PC is used as a general register. This arrangement precludes the shifting or byte 
swapping of data being loaded into the PC that is possible with data destined for one 
of the other general registers residing in the scratchpads. As a consequence, 
arithmetic shift left and byte swap operations on the PC do not cause the PC to be 
modified, although the condition codes are updated as though it were. 

Processor access to the UNIBUS, Fastbus, and internal registers is via the bus 
register MUX (BRMUX), the bus register (BR and BRA), and the data out MUX (DMUX). 
The BR and BRA (the duplication is due to electrical loading considerations) are 
logically a single register as shown in Figure 15. They receive all incoming data and 
transmit almost all outgoing data in addition to accumulating the more significant half of 
results during multiply and divide. The BRMUX selects the input to the BR (and BRA) 
from among the two external buses and internal input bus for input to the processor 
and from the SHFMUX for output from the processor via the BR and DMUX to the 
external buses and internal output bus. The internal buses connect a number of 
special registers and an optional floating-point processor to the data paths. Of these, 
only the PS is indicated in Figure 15. The instruction register (duplicated as IR and 
AFIR, again for electrical loading reasons) are also loaded from the BRMUX but are only 
clocked when an instruction is fetched. 

Bus addresses are applied directly to the UNIBUS or to an optional memory 
mapping unit by the bus address multiplexor (BAMUX). No bus address register is 
needed since memory access and processor clocking are fully interlocked except 
during an overlapped fetch in which case the PCB is held selected while operations 
continue in other parts of the data paths. 

The PDP-11/45 control unit is horizontally microprogrammed and is for the most 
part quite similar to the archetype described for midrange PDP-11 implementations. 
The control store is 256 words by 64 bits. The relatively wide microword is 
necessary for generating the large number of control signals used in conditioning and 
clocking the complicated data paths. An additional source of complexity is the timing 
logic needed to produce and use the five processor clock phases. 

There are two classes of microsequence-altering functions corresponding to the 
BUTs of other PDP-lls. The first class consists of simple branches having four or 
fewer possible branch addresses. These operate in the same fashion as BUTs. The 
second class of branches consists of three complex instruction decoding functions 
called forks. The first, fork A, does the initial instruction decode and corresponds to 
the BUT IRDECODE of other implementations. Fork B dispatches to an execute phase 
microroutine following a destination operand fetch. Fork C dispatches to a destination 
phase microroutine following a source operand fetch. A fork enable field in the 
microword is used to enable one fork at most during a cycle. When a fork and branch 
are combined in the same cycle, the fork is disabled if the branch is taken. This 
permits the implementation of certain functions without the use of additional cycles. 
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The 11/45 microcode is structured to take fuii advantage of the data paths and 
processor/UNIBUS overlap. Besides intensively exploiting special cases in the 
addressing modes and instruction set, the microprogram implements operand and 
instruction fetch overlap in much the same way as the 11/40. The one difference 
between the two prefetch mechanisms is that the 11/45 updates the PC value in PCB 
and stores it in PCA at the time the prefetch is started. References to the PC work 
correctly because PCB holds the old PC value until it is updated at the appropriate 
time. 


All the design decisions described above are directed toward implementing the 
fastest system possible. Tradeoffs involving circuit technology and control unit and 
data path organization have all been made with this end in mind. 


6. Measuring the Effect of Design Tradeoffs on Performance 

There are two alternative approaches to the problem of determining just how 
the particular binding of different design decisions affects the performance of each 
machine: 

1) Top-down approach - Attempt to isolate the effect of a particular design 
tradeoff over the entire space of implementations by fitting the individual 
performance figures for the whole family of machines to a mathematical 
model which treats the design parameters as independent variables and 
performance as the dependent variable. 

2) Bottom-up approach - Make a detailed sensitivity analysis of a particular 
tradeoff within a particular machine by comparing the performance of the 
machine both with and without the design feature while leaving all other 
design features the same. 

Each approach has its assets and liabilities for assessing design tradeoffs. The 

first method requires no information about the implementation of a machine, but does 
require a sufficiently large collection of different implementations, a sufficiently small 
number of independent variables, and an adequate mathematical model in order to 
explain the variance in the dependent variable to some reasonable level of statistical 
confidence. The second method, on the other hand, requires a great deal of knowledge 
about the implementation of the given system and a correspondingly great amount of 
analysis to isolate the effect of the single design decision on the performance of the 
complete system. The information that is yielded is quite exact, but applies only to the 
single point chosen in the design space and may not be generalized to other points in 
the space unless the assumptions concerning the machine’s implementation are similarly 
generalizable. In the following subsections the first method is used to determine the 
dominant tradeoffs and the second method is used to estimate the impact of individual 
implementation tradeoffs. 
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6.1 Quantifying Performance 

Measuring the change in performance of a particular PDP-11 processor model 
due to design changes presupposes the existence of some performance metric. 
Average instruction execution time was chosen because of its obvious relationship to 
instruction stream throughput. Neglected are such overhead factors as direct memory 
access, interrupt servicing, and, on the LSI-11, dynamic memory refresh. Average 
instruction execution times may be obtained by benchmarking or by calculation from 
instruction frequency and timing data. The latter method was chosen due to its 
freedom from the extraneous factors noted above and from the normal clock rate 
variations found from machine to machine of a given model. This method also allows us 
to calculate the change in average instruction execution time that would result from 
some change in the implementation. Such frequency-driven design has already been 
applied in practice to the PDP-11/60 [Mudg77]. 

The instruction frequencies are tabulated in Appendix A and include the 
frequencies of the various addressing modes. These figures were calculated from 
measurements made by Strecker [Stre76b] on 7.6 million instruction executions traced 
in 10 different PDP-11 instruction streams encountered in various applications. While 
there is a reasonable amount of variation of frequencies from one stream to the next, 
the figures of Appendix A should be representative. 

Instruction times are tabulated in Appendices B through I. These times were 
calculated from the engineering documents for each machine. The times vary from 
those published in the PDP-11 processor handbooks for two reasons. First, in the 
handbooks, times have been redistributed among phases to ease the process of 
calculating instruction times. In the appendices the attempt has been to accurately 
characterize each phase. Second, there are inaccuracies in the handbooks arising from 
conservative timing estimates and engineering revisions. The figures included here 
may be considered more accurate. 

A performance figure is arrived at for each machine by weighting its instruction 
times by frequency. The results, given in Table 4, form the basis of the analyses to 
follow. 
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Fetch 

Source 

Dest. 

Hjj 

Total 

Speed 
Relative to 
LSI-11 

LSI- 11 

2.514 

0.689 

1.360 

1.320 

5.883 

1.000 

PDP-1 1/04 

1.940 

0.610 

0.811 

0.682 

4.043 

1.455 

PDP-11/10 

1.500 

0.573 

0.929 

1.094 

4.096 

1.436 

PDP-11/20 

1.490 

0.468 

0.802 

0.768 

3.529 

1.667 

PDP-11/34 

1.630 

0.397 

0.538 

0.464 

3.029 

1.942 

PDP-11/40 

0.958 

0.260 

0.294 

0.575 

2.087 

2.819 

PDP-11/45 

(bipolar memory) 

0.363 

0.101 

0.213 

0.185 

0.863 

6.820 

PDP-11/60 

<877 cache hit ratio) 


0.185 

0.218 

m 

1.578 

3.727 


Table 4: Average PDP-1 1 Instruction Execution Times in Microseconds 


6.2 Analysis of Variance of PDP-1 1 Performance: Top-Down Approach 

The first method of analysis described above will be employed in an attempt to 
explain most of the variance in PDP-11 performance in terms of two parameters: 

1) MicrocycLe time - The microcycle time is used as a measure of processor 
performance which excludes the effect of the memory subsystem. 

2) Memory read pause time - The memory read pause time is defined as the 
period of time during which the processor clock is suspended during a 
memory read. For machines with processor/UNIBUS overlap, the clock is 
assumed to be turned off by the same microinstruction which initiates the 
memory access. Memory read pause time is used as a measure of the 
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memory subsystem’s impact on processor performance. Note that this 
time is less than the memory access time since all PDP-11 processor 
clocks will continue to run at least partially concurrently with a memory 
access. 

Th§ choice of these two factors is motivated by their dominant contribution to, and 
(approximately) linear relationship with, performance. Keeping the number of 
independent variables low is also important due to the small number of data points 
being fit to the model. 

The model itself is of the form: 

tj = k lCli + k 2 c 2i 

, where tj is the average instruction execution time of machine i from Table 4. 
c 2 1 is the microcycle time' of machine i (for machine with selectable 
microcycle times, the predominant time is used). 
c 2 j is the memory read pause time of machine i. 

This model is only an approximation since it assumes kj and k 2 will be constant 
over all machines. In general this will not be the case, k^ is the number of 
microcycles expected in a canonical instruction. This number will be a function mainly 
of data path connectivity and strictly speaking another factor should be included to 
take that variability into account; however, since the data path organization of all 
PDP-11 implementations considered here (excepting the 11/03, 11/45, and 11/60) are 
quite comparable, the simplifying assumption of calling them all identical at the price of 
explaining somewhat less of the variance shall be made. k 2 is the number of memory 
accesses expected in a canonical instruction and also exhibits some variability from 
machine to machine. A small part of this is due to the fact that some PDP-lls actually 
take more memory cycles to perform a given instruction than do others (this is really 
only a factor in certain 11/10 instructions, notably JMP and JSR, and the 11/20 MOV 
instruction). A more important source of variability is the UNIBUS/processor overlap 
logic incorporated into some PDP-11 implementations which effectively reduces the 
actual contribution of the k 2 c 2 j term by overlapping more memory access time with 
processor operation than is excluded from the memory read pause time. 

Given the model and the dependent and independent data for each machine as 
given in Tabie 5, a linear regression was applied to determine the coefficients kj and 
k 2 and to find out how much of the variance is explained by the model. 

Applying the regression over all eight processors: kj = 11.580, k 2 = 1.162, R^ = 
0.904. is the amount of variance accounted for by the model or 90.42. If the 
regression is applied to just the six midrange processors: k, = 10.896, ko ■ 1.194, R^ 

- O * - X t— 

- 0.962. R^ increases to 96.22 partly because fewer data points are being fit to the 
model and partly because the LSI-11 and 11/45 can be expected to have different k 
coefficients than the midrange machines and hence don’t fit the model as well. Note 
that if two midrange machines, the 11/04 and the 11/40, are eliminated instead of the 
LSI-11 and 11/45, then R^ decreases to 89.32 rather than increasing. The k 
coefficients are close to what should be expected for average microcycle and memory 
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Independent Variables 

Dependent 

Variable 

ucycle 

Time 

Memory 

Read 

Pause 

Time 

Average 

Instruction 

Execution 

Time 

LSI- 11 

0.400 

0.400 

5.883 

PDP-11/04 

0.260 

0.940 

4.043 

PDP-11/10 

0.300 

0.600 

4.096 

PDP-11/20 

0.280 

0.370 

3.529 

PDP-11/34 

0.180 

0.940 

3.029 

PDP-11/40 

0.140 

0.500 

2.087 

! 

PDP-11/45 

(bipolar memory) 

0.150 


0.863 

i 

PDP-11/60 

{ 877 . cache hit ratio) 

0.170 

moon 

1.578 


Table 5: Top-Down Model Parameters in Microseconds 


cycle counts. Since kj is much larger than k£, average instruction time is more 
sensitive to microcycle time than to memory read pause time by a factor of or 

approximately 10. The implication for the designer is that much more performance can 
be gained or lost by perturbing the microcycie time than memory read pause time. 

Although this method lacks statistical rigor, it is reasonably safe to say that 
memory and microcycle speed do have by far the largest impact on performance and 
that the dependency is quantifiable to some degree. 
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6.3 Measuring Second-Order Effects: Bottom-Up Approach 

It is a great deal harder to measure the effect of other design tradeoffs on 
performance. The approximate methods employed in the previous section cannot be 
used because the effects being measured tend to be swamped out by first-order 
effects and often either cancel or reinforce one another making linear models useless. 
For these reasons such tradeoffs must be evaluated on a design-by-design basis as 
explained above. This subsection will evaluate several design tradeoffs in this way. 


6.3.1 Effect of Adding a Byte Swapper to the 11/10 

It is evident that the lack of a byte swapper on the PDP-11/10 has a negative 
effect on performance. In this subsection the performance gained by the addition of a 
byte swapper either before the B register or as part of the Bleg multiplexor is 
calculated. Adding a byte swapper would change five different parts of the instruction 
interpretation process: the source and destination phases where an odd-byte operand 
is read from memory, the execute phase where a swap byte instruction is executed in 
destination mode 0 and in destination modes 1 through 7, and the execute phase where 
an odd-byte address is modified. In each of these cases seven fast shift cycles would 
be eliminated and the remaining normal-speed shift cycle could be replaced by a byte 
swap cycle resulting in a savings of seven fast shift cycles or 1.050 usee. None of this 
time is overlapped with UNIBUS operations; hence, all would be saved. This savings is 
only effected; however, when a byte swap or odd-byte access is actually performed. 
The frequency with which this occurs is just the sum of the frequencies of the 
individual cases noted above or 0.0640. Multiplied by the time saved per occurrence 
gives a savings of 0.0672 usee or 1.64% of the average instruction execution time. The 
insignificance of this savings could well be used to support the decision for leaving the 
byte swapper out of the PDP-11/10. 


6.3.2 Effect of Adding Processor /UN 1 BUS Overlap to the 11/04 

Processor/UNIBUS overlap is not a feature of the 11/04 control unit. Adding 
this feature involves altering the control unit/UNIBUS synchronization logic so that the 
processor clock continues to run until a microcycle requiring the UNIBUS data from a 
DATI or DATIP is detected. A bus address register must also be added to drive the 
UNIBUS lines after the microcycle initiating the DATI/P is completed. This alteration 
allows time to be saved in two ways. First, processor cycles may be overlapped with 
memory read cycles as explained in Subsection 3.1.2. Second, since UNIBUS data is not 
read into the data paths during the cycle in which the DATI/P occurs, the path from 
the ALU through the AMUX and back to the registers is freed. This permits certain 
operations to be performed in the same cycle as the DATI/P, for example, the 
microword BA+-PC; DATI; PO-PC+2 could be used to start fetching the word pointed to 
by the PC while simultaneously incrementing the PC to address the next word. The 
cycle following could then load the UNIBUS data directly into a scratchpad register 
rather than loading the data into the Breg and then into the scratchpad on the 
following cycle as is necessary without overlap logic. A savings of two microcycle 
times would result. 
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DATI and DATIP operations are scattered liberally throughout the 11/04 
microcode; however, only those cycles in which an overlap would produce a time 
savings need be considered. An average of 0.730 cycles can be saved or overlapped 
during each instruction. If all of the overlapped time is actually saved, then 0.190 usee 
or 4.7071 will be pared from the average instruction execution time. This amounts to a 
4.9371 increase in performance. 


6.3.3 Effect of Caching on the 11/60 

The PDP-11/60 uses a cache to decrease its effective memory read pause time. 
The degree to which this time is reduced depends upon three factors: the cache read 
hit pause time, the cache read miss pause time, and the ratio of cache read hits to total 
memory read accesses. A write-through cache is assumed; therefore, the timing of 
memory write accesses is not affected by caching and only read accesses need be 
considered. The performance of the 11/60 as measured by average instruction 
execution time is modeled exactly as a function of the above three parameters by the 
equation: 


t = + kg(kga + k^fl-a]) 

, where t is the average instruction execution time, 
a is the cache hit ratio. 

kj is the average execution time of a PDP-11/60 instruction excluding 
memory read pause time but including memory write pause time 
(1.339 usee). 

kg is the number of memory reads per average instruction (1.713). 
kg is the memory read pause time for a cache hit (0.000 usee), 
k^ is the memory read pause time for a cache miss (1.075 usee). 

The above equation can be rearranged to yield: 


t ■ (kj+kgk^ - k 2 (k4-kg)a 

The first term and the coefficient of the second term in the equation above 
evaluate to 3.181 usee and 1.842 usee respectively with the given k parameter values. 
This reduces the average instruction time to a function of the cache hit ratio making it 
possible to compare the effect of various caching schemes on 11/60 performance in 
terms of this one parameter. 

The effect of various cache organizations on the hit ratio is described for the 
PDP-11 family in general in [Stre76a] and for the PDP-11/60 in particular in [Mudg77]. 
If no cache is provided, the hit ratio is effectively zero and the average instruction 
execution time reduces to the first term in the model or 3.181 usee. A set associative 
cache with a set size of 1 word and a cache size of 1024 words has been found 
through simulation to give a .87 hit ratio. An average instruction time of 1.578 usee 
results for a 101.527 improvement in performance over that without the cache. 

The cache organization described above is that actually employed in the 11/60. 
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It has the virtue of being relatively simple to implement and therefore reasonably 
inexpensive. Set size or cache size can be increased to attain a higher hit ratio at a 
correspondingly higher cost. One alternative cache organization is a set size of 2 
words and a cache size of 2048 words. This organization boosts the hit ratio to .93 
resulting in an instruction time of 1.468 usee, an increase in performance of 7.537. 
This increased performance must be paid for, however, since twice as many memory 
chips are needed. Because the performance increment derived from the second cache 
organization is much smaller than that of the first while the cost increment is 
approximately the same, the first organization is more cost effective. 


6.3.4 Design Tradeoffs Affecting the Fetch Phase 

The fetch phase holds much potential for performance improvement since it 
consists of a single short sequence of microoperations that, as Table 4 clearly shows, 
involves a sizable fraction of the average instruction time due to the inevitable 
memory access and possible service operations. In this subsection two approaches to 
cutting this time are evaluated for four different processors. 

The UNIBUS interface logic of the PDP-11/04 and 11/34 are very similar. Both 
insert a delay into the initial microcycle of the fetch phase to allow time for bus grant 
arbitration circuitry to settle so that a microbranch can be taken if a serviceable 
condition exists. If the arbitration logic were redesigned to eliminate this delay, the 
average instruction execution time would drop by 0.220 usee for the 11/04 and 0.150 
usee for the 11/34®. The resulting increases in performance would be 5.75% and 
5.212 respectively. 

Another example of a design feature affecting the fetch phase is the operand/ 
instruction fetch overlap mechanism of the 11/40, 11/45, and 11/60. From the normal 
fetch times in the appendices and the actual average fetch times given in Table 4, the 
savings in fetch phase time alone can be calculated to be 0.162 usee for the 11/40, 
0.087 usee for the 11/45, and 0.118 usee for the 11/60 or an increase of 7.772, 
10.077, and 8.112 over what their respective performances would be if fetch phase 
time were not overlapped. 

These examples demonstrate the practicality of optimizing sequences of control 
states that have a high frequency of occurrence rather than just those which have 
long durations. The 11/10 byte swap logic is quite slow, but is utilized infrequently 
causing its impact upon performance to be small while the bus arbitration logic of the 
11/34 exacts only a small time penalty, but does so each time an instruction is 
executed and results in a larger performance impact. The usefulness of frequency 
data should thus be apparent since the bottlenecks in a design are often not where 
intuition says they should be. 


® These figures are typical. Since the delay is set by an RC circuit and Schmitt trigger, 
the delay may vary considerably from machine to machine of a given model. 
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7. Summary and Use of the Methodologies 

The PDP-11 offers an interesting opportunity to examine an architecture with 
numerous implementations spanning a wide range of price and performance. The 
implementations appear to fall into three distinct categories: the midrange machines 
(PDP-1 1/04/10/20/34/40/60), an inexpensive, relatively low-performance machine 
(LSI-11), and a comparatively expensive, but high-performance machine (PDP-11/45). 
The midrange machines are all minor variations on a common theme with each 
implementation introducing much less variability than might be expected. Their 
differences reside in the presence or absence of certain embellishments rather than in 
any major structural differences. This common design scheme is still quite 
recognizable in the LSI-11 and even in the PDP-11/45. The deviations of the LSI-11 
arise from limitations imposed by semiconductor technology rather than directly from 
cost or performance considerations although the technology decision derives from cost. 
In the PDP-11/45, on the other hand, the quantum jump in complexity is purely 
motivated by the desire to squeeze the maximum performance out of the architecture. 

From the overall performance model presented in Section 6.2, it is evident that 
instruction stream processing can be speeded up either by improving the performance 
of the memory subsystem or the performance of the processor. Memory subsystem 
performance depends upon number of memory accesses in a canonical instruction and 
the effective memory read pause time. There is not much that can be done about the 
first number since it is a function of the architecture and thus largely fixed. The 
second number may be improved, however, by the use of faster memory components 
or techniques such as caching. 

Performance of the PDP-11 processor itself can be enhanced in two ways: by 
cutting the number of processor cycles to perform a given function or by cutting the 
time used per microcycle. Several approaches to decreasing the effective microcycle 
count have been demonstrated: 

1) Structure the data paths for maximum parallelism - The PDP-11/45 can 

perform much more in a given microcycle than any of the midrange 
PDP-lls and thus needs fewer microcycies to complete an instruction. To 
obtain this increased functionality, however, a much more elaborate set of 
data paths is required in addition to a highly developed control unit to 
excercise them to maximum potential. Such a change is not an incremental 
one and involves rethinking the entire implementation. 

2) Structure the microcode to take best advantage of instruction features - 
All processors except, the 11/10 handle JMP/JSR addressing modes as a 
special case in the microcode. Most do the same for the destination 
modes of the MOV instruction because of its high frequency. Varying 
degrees of sophistication in instruction dispatching from the BUT 
IRDECODE at the end of every fetch is evident in different machines 
resulting in various performance improvements. 

3) Cut effective microcycle count by overlapping processor and UNIBUS 
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operation. - The POP-1 1/10 demonstrates that a large microcycle count 
can be effectively reduced by placing cycles in parallel with memory 
access operations whenever possible. 

Increasing rnicrocycle speed is perhaps more generally useful since it can often 
be applied without making substantial changes to an entire implementation. Several of 
the midrange PDP-lis achieve most of their performance improvement by increasing 
rnicrocycle speed in the following ways: 

1) Make the data paths faster - The PDP-11/34 demonstrates the 
improvement in rnicrocycle time that can resuit from the judicious use of 
Schottky TTL in such heavily travelled points as the ALU, Replacing the 
ALU and carry-lookahead logic alone with Schottky equivalents saves 
approximately 35 nanoseconds in propagation delay. With cycle times 
running 300 nanoseconds and less, this amounts to better than a 10 7 . 
increase in speed. 

2) Make each rnicrocycle take only as long as necessary - The 11/34 and 
11/40 both use selectable rnicrocycle times to speed up cycles which 
don’t entail long data path propagation delays. 

Circuit technology is perhaps the single most important factor in performance. 
It is only stating the obvious to say that doubling circuit speed will double total 
performance. Aside from raw speed, circuit technology dictates what it is economically 
feasible to build as witnessed by the SSI PDP-11/20, the MSI PDP-11/40, and the 
LSI-11. Just the limitations of a particular circuit technology at a given point in time 
may dictate much about the design tradeoffs that can be made as in the case of the 
LSI-11. 

Turning to the methodologies, the two presented in Section 6 can be used at 
various times during the design cycle. The top-down approach can be used to 
estimate the performance of a proposed implementation, or to plan a family of 
implementations, given only the characteristics of the selected technology and a 
general estimate of data path and memory cycle utilization. 

The bottom-up approach can be used to perturb an existing or planned design 
to determine the performance payoff of a particular design tradeoff. The relative 
frequencies of each function (e.g. addressing modes, instructions, etc.), while required 
for an accurate prediction, may not be available. There are, however, alternative ways 
to estimate relative frequencies. Consider the three following situations: 

1) At least one implementation exists - An analysis of the implementation in 
typical usage (i.e. benchmark programs for a stored-program computer) 
can provide the relative frequencies. 

2) No implementation exists , 6ut similar systems exist - The frequency data 
may be extrapolated from measurements made on a machine with a similar 
architecture. For example, the Gibson Mix [Bell71] provided the relative 
frequencies of IBM 7090 functions from which the relative frequencies of 
IBM 360 functions were estimated. 
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3) No implementation exists and there are no prior similar systems - From 
Knowledge of the specifications, a set of most-used functions can be 
estimated (e.g. instruction fetch, register and relative addressing, move 
and add instructions for a stored-program computer). The design is then 
optimized for these functions. 

Of course, the relative frequency data should always be updated to take into account 
new data. 

Our purpose in writing this paper has been two-fold: to provide data about 
design tradeoffs and to suggest design methodologies based on this data. It is hoped 
that the design data will stimulate the study of other methodologies while the results 
of the design methodologies presented here have demonstrated their usefulness to 
designers. 
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Figure 6: 


Archetypical Microprogrammed PDP-11 Control Unit 
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Figure 7: PDP-11/20 Data Paths 

Note: All data paths are 16 bits wide unless otherwise indicated. 
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Figure 9: PDP-11/40 Data Paths 

Note: All data paths are 16 bits wide unless otherwise indicated. 
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Figure 10: PDP-11 /1 0 Data Paths 

Note: All data paths are 16 bits wide unless otherwise indicated. 
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Figure 11: PDP-11/04 Data Paths 


Note: Ail data paths are 16 bits wide unless otherwise indicated. 
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Figure 12: PDP-11/34 Data Paths 

Note- All data paths are 16 bits wide unless otherwise indicated. 







Figure 13: PDP-1 1/60 Data Paths 

Note: All data paths are 16 bits wide unless otherwise indicated. 





Figure 14: LSI-11 Data Paths 

Note* AH data paths are 8 bits wide unless- other wise indicated. 
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Figure 15: PDP-1 1/45 Data Paths 

Note*- All data paths are 16 bits wide unless otherwise indicated. 
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Introduction to the Appendices 


Appendix A tabulates the frequencies of PDP-11 instructions and addressing 
modes. This data was derived as explained in Subsection 6.1. Frequencies are given 
for the occurrence of each phase (e.g. source, which occurs only during double- 
operand instructions), each subcase of each phase (e.g. jump destination, which occurs 
only during jump or jump to subroutine instructions), and each instance of each phase 
such as a particular addressing mode or instruction. The frequency with which the 
phase is skipped is listed for source and destination phases. Source and destination 
odd-byte-addressing frequencies are listed as well due to their effect on instruction 
timing. 


Appendices B through I tabulate the calculated instruction execution times for 
all the PDP-11 processors reviewed here. These calculations have been made 
assuming certain processor and memory timing characteristics described at the end of 
each appendix. Normal' timing variations from machine to machine can be significant; 
therefore, the times given here can only be taken as typical. 
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Appendix A: Instruction Time Component Frequencies 



Frequency 


Frequency 

Fetch 

1.0000 

Execute 

1.0000 



Instruction 


Source 

0.4069 

— 


Mode 


Double Operand 

0.4069 

0 R 

0.1377 

ADD 

0.0524 

I <n>R or (R) 

0.0338 

SUB 

0.0274 

2 (R)+ 

0.1587 

8IC 

0.0309 

3 s»(R)* 

0.0122 

BICB 

0. 

4 -(R> 

0.0352 

BIS 

0.0012 

5 <a-(R) 

0.0000 

BISB 

0.0013 

6 X(R) 

0.0271 

CMP 

0.0626 

7 <*X<R) 

0.0022 

CMPB 

0.0212 

No Source 

0.5931 

BIT 

0.0041 

NOTE: 


BITB 

0.0014 

Frequency of odd-byte 

addressing (SMI -7) • 0.0252. 

MOV 

0.1517 



MOVB 

0.0524 

Destination 

0.6872 

XOR 

0. 

Data Manipulation 

0.6355 

Single Operand 

0.2286 

Mode 


CLR 

0.0186 

0 R 

0.3146 

CLR8 

0.0018 

1 (f*>R or R 

0.0599 

COM 

0. 

2 <R*4 

0.0854 

COMB 

0. 

3 <o><R )♦ 

0.0307 

INC 

0.0224 

4 -<R) 

0.0823 

INCB 

0. 

5 «-<R> 

0.0000 

DEC 

0.0809 

6 X(R) 

0.0547 

DECS. 

0. 

7 <$X(R) 

0.0080 

NEG 

0.0038 

NOTE: 


NEGB 

0. 

Frequency of odd-byte 

addressing (DM 1-7) • 0.0213. 

ADC 

0.0070 



ADCB 

0. 

Jump (JMP/JSR) 

0.0517 

SBC 

0. 

Mode 


SBCB 

0. 

0 R 

0.0000 (ILLEGAL) 

ROR 

0.0036 

1 ?aR or (R) 

0.0000 

RORB 

0. 

2 <R)+ 

0.0000 

ROl 

0.0059 

3 g>(R)* 

0.0079 

ROLB 

0. 

4 -(R) 

0.0000 

ASR 

0.0069 

5 P>-(R) 

0.0000 

ASRB 

0. 

6 X(R) 

0.0438 

ASL 

0.0298 

7 6>X(R) 

0.0000 

ASLB 

0. 



TST 

0.0329 

No Destination 

0.3128 

TSTB 

0.0079 



SWAB 

0.0038 



SXT 

0. 
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Frequency 

Branch 

0.2853 

All Branches (true) 

0.1744 

All Branches (false) 

0.1109 

SOB (true) 

0 . 

SOB (false) 

0 . 

Jump 

0.0517 

JMP 

0.0272 

JSR 

0.0245 

Control, Trap, 

0.0270 

and Miscellaneous 


Set/Clear Condition Codes 

0.0017 

MARK 

0 . 

RTS 

0.0236 

RTI 

0 . 

RTT 

0 . 

10T 

0 . 

EMT 

0.0017 

TRAP 

0 . 

BPT 

0 . 


NOTES: 

Frequency of destination odd-byte addressing (DM1-7) - 0.0213. 

Execution frequencies indicated as 0. have an aggregate frequency < 0.0050. 
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Appendix B: LSI- 11 Instruction Execution Times 


Microcycie Time - 0.400 Microseconds 

Memory Micro- 

Reads Cycles Time (usee) Notes 


Fetch Time 1 5 2.400 

NOTE: 

The following instructions take additional ucycles to decode: 

AH single-operand instructions except SWAB, SXT, MFPS, and MTPS add 1 ucycle (*0,400 usee). 
XOR, JMP, RTS, RTI, RTT, set/clear condition codes add 1 ucycle (*0.400 usee), 

SWAB adds 2 ucycles (*0.800 usee). 

MFPS, MTPS add 4 ucycles (*1.600 usee). 

SXT adds 5 ucycles (*2.000 usee). 

BPT, IOT add 6 ucycles (*2.400 usee). 

MARK adds 8 ucycles (*3.200 usee). 


Source Times 
Mode 


0 R 

0 

1 

0.400 


1 f»R or (R) 

i 

3 

1.600 

<1) 

2 <R>* 

1 

4 

2.000 

<3> 

3 @<R). 

2 

7 

3.600 

(1) 

4 -<R> 

1 

5 

2.400 

(2) 

5 (a-<R) 

2 

8 

4.000 

(1) 

6 X(R) 

2 

9 

4.400 

(1) 

7 <?X(R) 

3 

12 

6.000 

<1> 


NOTES: 

(1) Byte addressing subtracts 1 ucycle (-0.400 usee). 

(2) Byte addressing adds 1 ucycle (+0.400 usee). 

(3) If register / R6 or R7, byte addressing adds 1 ucycle (*0.400 usee). 


Destination Times 


Data Manipulation 
Mode 


0 R 

0 

1 

0.400 


1 <s»R or (R) 

1 

4 

2.000 


2 (R>+ 

1 

5 

2.400 

u> 

3 f*»(R>+ 

2 

8 

4.000 


4 -<R) 

1 

6 

2.800 

(i) 

5 <MR) 

2 

9 

4.400 


6 X(R) 

2 

10 

4.800 


7 <MR) 

3 

13 

6.400 



NOTES: 

For MOV: DM0 subtracts 1 ucycle (-0.400 usee). DM1-7 subtracts 2 ucycles and memory read (-1.200 usee). 
Byte addressing (DM1-7) subtracts 1 ucycle (-0.400 usee). 

(1) If register - R6 or R7, byte addressing adds 2 ucycles (+0.800 usee) additive to the time noted directly 
above. 
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Memory 

Memory 

Micro- 



Reads 

Writes 

Cycles 

Time (usee) 

Notes 


Jump (JMP/JSR) 
Mode 


0 R 

1 <*R or (R) 

ILLEGAL 

0 

3 

1.200 

2 (R)* 

0 

5 

2.000 

3 <m>(R)+ 

1 

5 

2.400 

4 -<R> 

0 

5 

2.000 

5 <n?-(R) 

1 

6 

2.800 

e X(R) 

1 

7 

3.200 

7 <raX<R) 

2 

10 

4.800 


Execute Times 
Instruction 


Double Operand 

ADO, SUB, BIC, BIS, 
XOR 

1 

3 

1.600 

(3) 

BICB, BISB 

1 

2 

1.200 

(3) 

CMP, BIT 

0 

2 

0.800 


CMPB, BITS 

0 

1 

0.400 


MOV 

1 

3 

1.600 

(2) 

MOVB 

I 

2 

1.200 

<1) 

Single Operand 

CLR 

1 

3 

1.600 

(2) 

CLRB 

i 

3 

1.600 

(2) 

COM, NEG 

i 

4 

2.000 

(2) 

COMB, NEGB 

i 

3 

1.600 

(2) 

INC, DEC, ADC, SBC 

l 

5 

2.400 

(3) 

INCB, DECB, ADCB, SBCB 

l 

4 

2.000 

(3) 

ROR 

i 

8 

3.600 

(3) 

RORB 

i 

5 

2.400 

(3) 

ASR 

i 

9 

4.000 

(3) 

ASRB 

i 

8 

3.600 

(4) 

ROL, ASL 

i 

4 

2.000 

(3) 

ROLB, ASIB 

i 

3 

1.600 

(3) 

TST 

0 

4 

1.600 


TSTB 

0 

3 

1.200 


SWAB 

1 

3 

1.600 

(2) 

SXT 

1 

6 

2.800 

(3) 

MFPS 

1 

8 

aeoo 

(1,9) 

MTPS 

1 

10 

4.400 

(2,5,5 
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Memory 

Memory 

Micro- 




Reads 

Writes 

Cycles 

Time (usee) 

Notes 

Branch 

All Branches (true) 



4 

1.600 


All Branches (false) 



4 

1.600 


SOB (true) 



8 

3.200 


SOB (false) 



6 

2.400 


Jump 






JMP 



2 

0.800 


JSR (register • R7) 


1 

6 

2.800 


JSR (register + R7) 


1 

15 

6.400 


Control, Trap, and Miscellaneous 
Set/Clear Condition Codes 



3 

1.200 


MARK 

1 


16 

6.800 


RTS 

1 


6 

2.800 


RTI 

2 


15 

6.800 

(5,6) 

RTT 

2 


15 

6.800 

(5,7) 

IOT, EMT, TRAP, BPT 

2 

2 

33 

14.800 

(5,8) 


NOTES; 

(1) DMO adds 1 ucycie and subtracts memory write (+0.000 usee), 

(2) DMO subtracts memory write (-0,400 usee). 

(3) DMO subtracts 1 ucycie and memory write (-0.800 usee). 

(4) DMO subtracts 3 ucycles and memory write (-1.600 usee). 

(5) If new PS has bit 7 clear, add 1 ucycie (+0.400 usee). 

(6) If new PS has bit 4 set, add 9 ucycles (+3.600 usee). 

<7) If new PS has bit 4 set, add 10 ucycles (+4.000 usee). 

(8) If new PS has bit 4 set, add 1 ucycie (+0.400 usee). 

(9) Byte instruction. 

(10) Use destination rather than source times. 


NOTE: 

Times given apply to microcode revision 2(4), MICROMs CPI 631-10 (DEC 23-0S8A5) and CPI 631-07 (DEC 
23-087A5). 


Times Assumed for All Calculations 


1) Microcycle time is 0.400 usee. 

2) Microcycle time is extended by 0.400 usee during DATI/DATIP/DATO/DATOB. (Note*- 1 extra wait ucycie is 

actually generated for each memory access; however, these ucycles have not been tallied in the microcycle 
counts above.) 
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Appendix C: PDP- 11/04 Instruction Execution Times 


Microcycle Time ■ 0.260 Microseconds 

Memory 

Reads 


Fetch Time 1 


Micro- 

Cycies 

3 


Source Times 


Mode 

OR 0 

1 or (R) 1 

2 <R>* 1 

3 2 

4 -<R> 1 

5 <a-(R) 2 

6 X(R) 2 

7 <a>X(R) 3 

NOTE: 


Odd-byte addressing (SMI -7) adds 2 ucycles (♦0.520 usee). 


2 

2 

3 

5 

3 

5 

a 

8 


Destination Times 


Data Manipulation 
Mode 

OR 0 1 

1 <PR or (R) i 1 

2 (R)* 1 2 

3 0<R>* 2 4 

4 -<R) 1 2 

5 <MR> 2 4 

6 X(R> 2 5 

7 <®X<R) 3 7 

NOTE: 


Odd-byte addressing (DM 1-7) adds 2 ucycles (#0.520 usee). 


Jump (JMP/JSR) 
Mode 

0 R 

1 <aR or (R) 

2 <R>* 

3 M R>* 

4 -<R> 

5 *MR> 

6 X(R) 

7 @X(R) 


ILLEGAL 

0 

0 

1 

0 

1 

1 

2 


2 

3 

3 

3 

3 

4 
6 


Time (usee) 
1.940 


0.520 

1.460 

1.720 

3.180 

1.720 

3.180 

3.440 

4.900 


0.260 

1.200 

1.460 

2.920 

1.460 

2.920 

3180 

4.640 


0.520 

0.780 

1.720 

0.780 

1.720 

1.980 

3.440 
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Memory 

Memory 

Micro- 



Reads 

Writes 

Cycles 

Time (usee) 

Notes 


Execute Times 
Instruction 


Double Operand 

ADD, SUB, BIC(B), BIS(B) 

1 

2 

1.060 

a) 

CMP(B), BIT(B) 

0 

i 

0.260 


MOV(B) 

i 

2 

1.060 

0,2) 

Single Operand 
CLR(B), COM(B), INC(B), 

1 

2 

1.060 

(1) 

DEC(B), NEG(B), ADC(B), 
SBC(B) 





ROR(B), R0L(B>, ASR(B), 

i 

3 

1.320 

(1) 

ASL(B) 





TST(B) 

0 

1 

0.260 


SWAB 

1 

3 

1.320 

(1) 

Branch 

All Branches (true) 


3 

0,780 


All Branches (false) 


0 

0.000 



Jump 


JMP 



0 

0.000 

JSR 


1 

7 

2.360 

Control, Trap, and Miscellaneous 
Set/Clear Condition Codes 



2 

0.520 

RTS 

1 


5 

2.240 

RTI 

2 


6 

3.440 

IOT, EMT, TRAP, BPT 

2 

2 

12 

6.080 


NOTES: 

(1) Destination odd-byte addressing (PM1-7) adds 2 ucyclss {*0,520 usee). DM0 subtracts memory write 

(-0.540 ,usec). 

(2) DM0 subtracts 1 additional ucycle (-0.260 usee). 


Times Assumed for Ail Calculations 


1) Microcycle time is 0.260 usee. 

2) Microcycle time is extended by 0.220 usee by bus priority arbitration delay during BUT SERVICE. 

3) Microcycle time is extended by 0.940 usee during DATI/DATIP (M0S memory). 

4) Microcycle time is extended by 0.540 usee during DATO/DATOB (MOS memory). 
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Appendix D: 


PDP- 11/10 Instruction Execution Times 


Microcycle Time ■ 0.300 Microseconds 

Memory 

Reads 


Fetch Time 1 


Micro- 

Cycles Time (usee) 
5 1.500 


Source Times 
Mode 

0 R 

1 <aR or (R) 

2 <R)+ 

3 <a(R)+ 

4 -<R) 

5 <a-(R) 

6 XCR) 

7 PX(R) 

NOTE: 

Odd-byte addressing (SM1- 
usec. 


0 

1 

1 

2 

1 

2 

2 

3 


2 0.600 

3 1.500 

5 1.500 

7 2.700 

4 1.500 

6 2.700 

7 2.700 

9 3900 


7) adds 7 fast shift (0.150 usec/ucycle) and 1 regular ucycle for a total of ♦ 1.350 


Destination Times 

Data Manipulation 
Mode 
0 R 

0 

2 

0.600 (1) 

1 *»R or (R) 

1 

3 

1.500 

2 <R)+ 

1 

5 

1.500 

3 (UK??)* 

2 

7 

2.700 

4 -<R) 

1 

4 

1.500 

5 <a-<R> 

2 

6 

2.700 

6 XCR) 

2 

7 

2.700 

7 @X(R) 

3 

9 

asoo 

NOTE: 

Odd-byte addressing (DMi- 

•7) adds 7 fast shift (0.150 

usec/ucycle) and 1 regular ucycle for a total of +1.350 

usec 

(1) MOV subtracts 1 ucyct 

Jump (JMP/JSR) 

Mode 

0 R 

1 <«>R or (R) 

e (-0.300 usee). 

ILLEGAL 

1 

1 

0.900 

2 (R>4 

1 

3 

0,900 

3 («»<R )♦ 

2 

5 

2.100 

4 «(R) 

1 

2 

0.900 

5 (o-(R) 

2 

4 

2.100 

6 X<R) 

2 

5 

2.100 

7 <oX(R> 

3 

■as 

7 

3300 
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Memory 

Memory 

Micro- 



Reads 

Writes 

Cycles 

Time (usee) 

Notes 


Execute Times 
Instruction 


Double Operand 

ADD, SUB, BIC(B), BIS(B> 

1 

4 

1.800 

<i) 

CMP(B), BIT(B) 

0 

2 

0.600 


MOV(B) 

1 

4 

1.800 

(i) 

Single Operand 
CLR(B), COM(B), INC(B), 

1 

5 

2.100 

(i) 

DEC(B), NEG(B), ADC(B), 
SBC(B), ROR(B), ROL(B), 
ASR(B), ASUB) 





TST(B) 

0 

3 

0.900 


SWAB 

1 

12 

aiso 

(1,2) 

Branch 

All Branches (true) 


3 

0.900 


All Branchos (false) 


1 

0.300 


Jump 





JMP 


2 

0.600 


JSR 

1 

9 

3.300 


Control, Trap, and Miscellaneous 
Set/CJear Condition Codes 


3 

0.900 


RTS 1 


7 

2.100 


RTI 2 


9 

2,700 


IOT, EMT, TRAP, BPT 2 

2 

13 

6.300 



NOTES: 

(1) Destination odd-byte addressing (DM 1-7) adds 7 fast shift ucycles (0.150 usec/ucycle) for a total of 

♦ 1.050 usee. DM0 subtracts 2 ucycles and memory write (-1.200 usee). 

(2) Byte swap consists of 7 fast shift (0.150 usec/ucycle) and 1 regular ucycle for a total of +1.350 usee. 


NOTE; 

Times given apply to the M7261 microprogram module, revision R. Earlier versions use additional ucycles. 


Times Assumed for All Calculations 


1) Microcycle time is 0.300 usee. 

2) A CKOFF following a DATI/DATIP/DATO/DATOB extends ucycle time by 0.600 usee minus 0.300 usee for 

each ucycle that the CKOFF is removed from the cycle initiating the bus transaction. 
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Appendix E: PDP- 11/20 Instruction Execution Times 


Microcycle Time ■ 0.280 Microseconds 


Memory Micro- 

Reads Cycles 


Fetch Time 


4 


Source Times 


Mode 

OR 0 

1 <a R or (R) 1 

2 (R)+ 1 

3 MR)* 2 

4 -<R) 1 

5 <MR) 2 

6 X(R) 2 

7 <n>X(R) 3 

NOTE; 


Odd-byte addressing (SMI -7) adds 2 ucycles (+0.560 usee). 


0 

4 

4 

7 

4 

7 

7 

10 


Destination Times 


Data Manipulation 
Mode 


0 R 

1 ^R or (R) 

2 <R)+ 

3 MR)* 

4 -<R) 

5 MR) 

6 X(R> 

7 ^X(R) 
NOTES: 


0 

1 

1 

2 

1 

2 

2 

3 


1 

4 

4 

7 

4 

7 

7 

10 


Odd-byte addressing (DM1-7) adds 2 ucycles (+0.560 usee). 
Non-modifying instruction (CMP(8) t BIT(B), TST(B)) adds 0 ucycles 


Jump (JMP/JSR) 
Mode 

0 R 

1 <ftR or <R) 

2 (R)+ 

3 MR)* 

4 -<R> 

5 MR) 

6 X(R) 

7 <aX(R) 


ILLEGAL 

0 

0 

1 

0 

1 

1 

2 


4 

4 

7 

4 

7 

7 

10 


Time (usee) 
1.490 


0.000 

1.490 

1.490 

2.700 

1.490 

2.700 

2.700 

3910 


0.280 

1.390 

1.390 

2.600 

1.390 

2.600 

2.600 

3.810 


(+0.100 usee for DATI in place of DATIP). 


1.120 

1.120 

2.330 

1.120 

2.330 

2.330 

3.540 
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Memory 

Memory 

Micro- 



Reads 

Writes 

Cycles 

Time (usee) 

Notes 


Execute Times 
Instruction 


Double Operand 

ADD, SUB, BIS(B), MOV(B) 

i 

3 

0.840 

U) 

BIC(B) 

i 

5 

1.400 

a) 

BIT(B) 

0 

4 

1.120 


CMP(B) 

0 

2 

0.560 


Single Operand 
CLR(B), COM(B), INC(B), 

1 

3 

0.840 

a) 

DEC(B),.NEG(B), ADC(B), 
SBC(B) 





ROR(B), ROL(B), ASR(B), 

1 

3 

0.840 

(1,2) 

ASL(B) 

TST(B) 

0 

2 

0.560 


SWAB 

1 

3 

0.840 

(1) 

Branch 

All Branches (true) 


4 

1.120 


All Branches (false) 


0 

0.000 



Jump 


JMP 


0 

0.000 

JSR 

1 

10 

2.800 


Control, Trap, and Miscellaneous 


Set/Clear Condition Codes 



0 

0.000 

RTS 

I 


6 

2.050 

RTI 

2 


9 

3.260 

IOT, EMT, TRAP, BPT 

2 

2 

21 

6.620 


NOTES: 

(1) DMO subtracts 1 ucycle and memory write (-0.280 usee). PS as destination adds 1 ucycle (+0.280 usee). 

(2) Odd-byte addressing (DM 1-7) adds 2 ucycles (+0.560 usee). 


Times Assumed for All Calculations 


1) Microcycle time is 0.280 usee. 

2) Microcycle time is extended by 0.370 usee during DATI. 

3) Microcycle time is extended by 0.270 usee during DATIP. 

4) Microcycle time is expended by 0.000 usee during DAT0/DAT0B. 
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Appendix F: PDP-11/34 Instruction Execution Times 

Microcycle Time - 0.180/0.240 Microseconds 


Fetch Time 


Memory 

Reads 

1 


Source Times 


Mode 

OR 0 

1 f»R or <R) 1 

2 <R)+ 1 

3 *<R)* 2 

4 -<R) 1 

5 <a-<R> 2 

6 X(R) 2 

7 <aX(R) 3 

NOTE: 


(1) DM0 subtracts 1 ucycle (-0.180 usee). 


Destinations Times 


Micro- 

Cycies Time (usee) Notes 

3 1.630 


1 0.180 

1 1.120 

2 1.300 

3 2.420 

2 1.300 

3 2.420 

4 2.600 

5 a720 


(!) 


Data Manipulation 
Mode 


0 R 

0 

1 

0.180 

(1,2) 

1 @R or (R) 

1 

1 

1.120 


2 <R). 

1 

2 

1.300 

U) 

3 <a<R>. 

2 

3 

2.420 


4 -<R) 

1 

2 

1.300 


5 «MR> 

2 

3 

2.420 


6 X(R) 

2 

4 

2.600 


7 faX(R) 

3 

5 

3.720 



NOTES: 

MOV(B) and DM 1-7 changes long to short ucycle and subtracts memory read (-1.000 usee). 

(1) MOV(B) subtracts an additional ucycle (-0.180 usee). 

(2) Single-operand instruction except NEG(B) subtracts 1 ucycle (-0.180 usee). 


Jump (JMP/JSR) 
Mode 
0 R 

ILLEGAL 




1 (n>R or (ft) 

0 

0 

0.000 

(i) 

2 (R). 

0 

2 

0.360 


3 <3<R>. 

1 

2 

1.300 


4 -<R> 

0 

1 

0.180 


5 <P-(R) 

1 

2 

1,300 


6 X(R) 

1 

2 

1.300 

(i> 

7 «aX(R) 
NOTE; 

2 

4 

2.600 


(I) JSR adds 1 ucycle (+0.180 usee). 
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Memory 

Memory 

Micro- 



Reads 

Writes 

Cycles 

Time (usee) 

Notes 


Execution Time 
Instruction 


Double Operand 

ADD, SUB, BIC(B), BIS(B), 


1 

1 

0.780 

(i) 

MOV(B), XOR 
CMP(B), BIT(B) 


0 

1 

0.180 


Single Operand 

CLR(B), COM(B), INC(B), 


i 

i 

0.780 

a) 

DEC(B), ADC(B), SBC(B), 
SXT 






NEG(B) 


1 

2 

0.960 

a) 

ROR(B), ROL(B), ASR(B), 


1 

2 

0.960 

(2) 

ASL(B) 






TST(B) 


0 

1 

0.180 


SWAB 


1 

1 

0.780 

(2) 

Branch 

All Branches (true) 



3 

0.540 


All Branches (false) 



0 

0.000 


SOB (true) 



4 

0.780 


SOB (false) 



2 

0.420 


Jump 

JMP 



1 

0.180 


JSR 


1 

5 

1.500 


Control, Trap, and Miscellaneous 
Set/Clear Condition Codes 



2 

0360 


MARK 

1 


8 

2.380 


RTS 

1 


4 

1.660 


RTI, RTT 

2 


6 

2.960 


10 T, EMT, TRAP, BPT 

2 

2 

13 

5.420 



NOTES: 

(1) DMO subtracts memory write and changes long to short ucycie (-0.600 usee). 

(2) DMO subtracts memory write, changes long to short ucycie, and adds I ucycie (-0.420 usee). 


Times Assumed for All Calculations 


1) Microcycle times are 0.180 and 0.240 usee. 

2) Microcycle time is extended by 0.150 usee by bus priority arbitration delay during BUT SERVICE. 

3) Microcycle time is extended by 0.940 usee during DATI/DATIP (M0S memory). 

4) Microcycle time is extended by 0.540 usee during DAT0/DAT0B (M0S memory). 

5) Memory management unit delay is not included <40.120 usec/memory cycle when enabled). 
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Appendix G: PDP- 11/40 Instruction Execution Times 


Microcycle Time - 0.140/0.200/0,300 Microseconds 

Memory 

Reads 


Micro- 

Cycles Time (usee) 


Fetch Time 1 4 1.120 

NOTE: 

Execute phase of previous instruction may be overlapped with fetch. Consult execute phase notes for effect 
on timing. 


Source Times 
Mode 

OR 0 

1 (ftR or (R) 1 

2 (R)+ 1 

3 <a(R)* 2 

4 -<R) 1 

5 <MR) 2 

6 X(R) 2 

7 <MR) 3 

NOTE: 

Odd-byte addressing (SMi-7) adds 2 ucycles (*0.340 usee). 


Destinations Times 


Data Manipulation (except M0V(B» 

Mode 

OR 0 0 ( 

1 <n>R or (R) 1 3 < 

2 (R)* 1 3 l 

3 6>(R>* 2 5 

4 -(R) 1 3 < 

5 (MR) 2 5 

6 X(R> 2 5 

7 <MR) 3 7 : 

NOTES: 

Odd-byte addressing (DMi-7) adds 2 ucycles (*0.340 usee). 

(1) Single-operand instruction or SM0 subtracts 0 ucycles (-0.440 usee). 


MOV(B) 

Mode 

0 R 

1 @>R or (R) 

2 (R)* 

3 (aw* 

4 -(R) 

5 <MR> 

6 X(R) 

7 (MR) 


NOTE: 

(1) SM0 subtracts 0 ucycles (-0.440 usee). 
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Memory 

Memory 

Micro- 




Reads 

Writes 

Cycles 

Time (usee) 

Notes 

Jump (JMP/JSR) 
Mode 
0 R 

ILLEGAL 





1 (*R or <R) 

0 


2 

0.340 


2 (R)+ 

0 


3 

0.640 


3 <a<R)+ 

1 


2 

0.940 


4 -(R) 

0 


2 

0.440 


5 <MR> 

i 


2 

0.940 


6 X(R) 

1 


4 

0.840 


7 esX(R) 

2 


4 

1.340 


Execution Time 
Instruction 






Double Operand 

ADD, BIC(B), BIS(B), XOR 


i 

3 

0.540 

(1,2) 

SUB 


1 

4 

0.680 

(1) 

CMP(B), BIT<B) 


0 

3 

0.480 

(3) 

MOV(B> 


1 

3 

0.640 

(4) 

Single Operand 
CLR(B), COM(B), INC(B), 


1 

4 

0.620 

(1,2) 

DEC(B), ADC(B), SBC(B), 
ROL(B), ASL<B), SXT 






NEG(8) 


1 

3 

0.540 

(1,2) 

ROR(B), ASR(8) 


1 

4 

0.840 

(5) 

TST(B) 


0 

4 

0.620 

(1,2) 

SWAB 


1 

3 

0.540 

(1) 

Branch 

AH Branches (true) 



3 

0.640 


All Branches (false) 



2 

0.280 


SOB (true) 



5 

1.240 


SOB (false) 



5 

0.920 


Jump 






JMP 



2 

0.340 


JSR 


i 

6 

1.480 


Control, Trap, and Miscellaneous 
Set Condition Codes 



2 

0.600 


Clear Condition Codes 



3 

0.900 


MARK 

1 


6 

1.540 


RTS 

1 


4 

1.280 


RTI, RTT 

2 


6 

2.320 


IOT, EMT, TRAP, BPT 

2 

2 

14 

4. ISO 
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NOTES: 

If (single-operand instruction or SMO and double-operand instruction except MOVB), DMO, destination f 
register 7, and no service request pending, then next fetch is overlapped (-1 ucycle/-0.640 usee from 
noxt fetch). 

(1) If DMO, phase takes 3 ucycles and memory write is not done (0.480 usee). 

(2) If odd-byte addressing (DM1-7), phase takes 5 ucycles (1.020 usee). 

(3) If odd-byte addressing (DM1-7), phase takes 5 ucycles (0.820 usee). 

(4) If byte instruction and DM1-7, phase takes 4 ucycles (0.880 usee). For DMO: If word instruction, phase 

takes 2 ucycles (0.340 usee). If byte instruction, phase takes 4 ucycles (0.680 usee). 

(5) For DMO: If word instruction, phase takes 3 ucycles (0.740 usee). If byte instruction, phase fakes 4 

ucycles (0.880 usee). In neither case is memory write done. 


Times Assumed for All Calculations 


1) Microcycle times are 0.140, 0.200, and 0.300 usee. 

2) A CLKOFF following a DATI/DATIP extends ucycle time by 0.500 usee minus sum of cycle times between 

DATI/DATIP (exclusive) and CLKOFF (inclusive). 

3) A CLKOFF following a DAT0/DAT08 extends ucycle time by 0.200 usee minus sum of cycle times between 

DATO/DATOB (exclusive) and CLKOFF (inclusive). 

4) Memory management unit delay is not included (*0.150 usec/memory cycle when enabled). 
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Appendix H: PDP- 11/45 Instruction Execution Times 


Microcycle Time - 0.150 Microseconds 

Memory Micro- 

Reads Cycles Time (usee) Notes 


Fetch Time 1 3 0.450 

NOTE: 

Execute phase of previous instruction may be overlapped with fetch. Consult execute phase notes for effect 
on timing. 


Source Times 
Mode 


0 R 

0 

0 

0.000 

1 <g>R or (R) 

1 

2 

0.300 

2 (R)* 

1 

2 

0.300 

3 MRU 

2 

5 

0.750 

4 -<R> 

1 

3 

0.450 

5 <*-(R) 

2 

6 

0.900 

6 X(R) 

2 

4 

0.600 

7 <*X(R) 

3 

7 

1.050 


Destinations Times 


Data Manipulation 
Mode 


0 R 

0 

0 

0.000 

1 $i>R or (R) 

1 

2 

0.300 

2 (R)+ 

1 

2 

0.300 

3 <a(R). 

2 

5 

0.750 

4 -<R) 

1 

3 

0.450 

5 ?s-(R) 

2 

6 

0.900 

6 X(R) 

2 

5 

0.750 

7 <n>X(R) 3 

NOTE: 

MOV and DM 1-7 subtracts memory read (-0.000 usee). 
Odd-byte addressing (DM 1-7) adds 1 ucycle (*0.150 usee). 

8 

1.200 

(1) Single-operand instruction 

or SM0 subtracts 1 ucycle (*0.150 usee). 



Jump (JMP/JSR) 


Mode 


0 R 

1 <?JtR or (R) 

ILLEGAL 

0 

2 

0.300 

2 (R)* 

0 

2 

0.300 

3 <a>(R)+ 

1 

4 

0.600 

4 -<R> 

0 

2 

0.300 

5 <MR) 

1 

5 

0.750 

6 X(R) 

1 

3 

0.450 

7 <aX(R) 

2 

6 

0.900 
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Memory Memory Micro- 

Roads Writes Cycles Time (usee) Notes 


Execution Time 
Instruction 


Double Operand 
ADD, SUB, BIC(B), BIS(B), 
MOVB, XOR 
CMP(B>, BIT(B> 

MOV 


Single Operand 
CLR(B), COM(B), INC(B), 
DEC<B), ADC(B), SBC(B), 
ROL(B), ASL(B), SWAB, 
SXT 
NEG(B) 

ROR(B), ASR(B) 

TST(B> 


Branch 

All Branches (true) 
All Branches (false) 
S08 (true) 

SOB (false) 


Jump 

JMP 

JSR 


Control, Trap, and Miscellaneous 
Set/Clear Condition Codes 


MARK 1 

RTS 1 

RTI, RTT 2 

BPT 2 

IOT 2 

EMT, TRAP 2 


I 

2 

0.300 

(i) 

0 

1 

0.150 

(1,2) 

1 

0 

0.000 

(1,3) 

1 

2 

0.300 

(1) 


1 

4 

0.600 

(4) 

1 

2 

0.300 

(1,5) 

0 

L 

0.150 

(1,2) 


1 

0.150 


0 

0.000 

(6) 

3 

0.450 

(6) 

2 

0.300 

(6) 



1 

0.150 

1 

5 

0.750 



1 

0.150 



4 

0.600 

(6) 


4 

0.600 



7 

1.050 


2 

12 

1.800 


2 

11 

1.650 


2 

13 

1.950 




70 


NOTES: 

(1) For DM0: 

If double-operand instruction, destination ^ register 7, and SMi-7: 

If odd-byte addressing, then phase takes 2 ucycles (0.300 usee), else phase takes 1 ucycle (0.150 
usee). If no service request is pending, then next fetch is overlapped (-1 ucycle/-0.1 50 usee 
from next fetch). 

If double-operand instruction, destination - register 7, and SMl-7: 

Phase takes 2 ucycles (0.300 usee). 

Otherwise (single-operand instruction or SMO): 

Phase takes 1 ucycle (0.150 usee). If destination / register 7 and no service request is pending, 
then next fetch is overlapped (-2 ucycles/-0.300 usee from next fetch). 

No memory write is done. 

(2) For DM 1-7, if destination fetch is via Fastbus and no service request is pending, then next instruction fetch 

is overlapped (-1 ucycle/-0.i 50 usee from next fetch). 

(3) DM 1-2 adds 1 ucycle (+0.150 usee). If no service request is pending, then next fetch is overlapped (-1 

ucycle/-0.150 usee from next fetch). 

(4) DM0 subtracts 2 ucycles and memory write (-0.300 usee). 

(5) Odd-byte addressing adds 1 ucycle (+0.150 usee). 

(6) If no service request is pending, then next fetch is overlapped (-1 ucycie/-0.150 usee from next fetch). 


Times Assumed for All Calculations 


1) Microcycle time is 0.150 usee. 

2) Memory access time does not influence microcycle times (bipolar memory). 

3) Memory management unit delay is not included (+0.090 usec/memory cycle when enabled). 
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Appendix I: PDP- 11/60 Instruction Execution Times 


Microcycle Time • 0.170 Microseconds 

Memory Micro- 

Reads Cycles Time (usee) Notes 


Fetch Time 1 3 0.510 

NOTES: 

The following instructions take 1 additional ucycie (+0.170 usee) to decode: XOR, SWAB, SXT, JSR, set/clear 
condition codes, MARK, SOB, RTS, RTI, RTT, IOT, EMT, TRAP, BPT, MFPI(D), MTPI(D). 

Fetch or execute phase of previous instruction may be overlapped with fetch. Consult execute phase notes 
for effect on timing. 


Source Times 
Mode 
0 R 

0 

0 

0.000 

1 $>R or (R) 

1 

2 

0.340 

2 (R)+ 

1 

2 

0.340 

3 <a(R)+ 

2 

5 

0.850 

4 -<R> 

1 

3 

0.510 

5 ®-<R) 

2 

a 

1.020 

6 X(R) 

2 

4 

0.680 

7 <aX(R) 

3 

7 

1.190 


NOTE: 

For SMI-7: Word instruction except MOV and DM1-7 adds 1 ucycie (+0.170 usee). Byte instuction adds 2 


ucycles (+0.340 usee). 

Destination Times 

Data Manipulation (except MOV(B) and MTPI(D)) 
Mode 

OR 0 

0 

0.000 

1 (&R or (R) 

1 

2 

0.340 

2 <R)+ 

1 

2 

0.340 

3 ^(R)+ 

2 

5 

0.850 

4 -<R) 

1 

3 

0.510 

5 @>-(R) 

2 

6 

1.020 

6 X(R) 

2 

5 

0.850 

7 <n>X<R) 

3 

8 

1.360 


NOTES: 

Byte addrossin$ (DMl-7) adds 2 ucycles (+0.340 usee). 

(1) Sin^lo-operand instruction except SWAB or SXT or SMO and double-operand instruction except XOR 
subtracts 1 ucycie (-0.170 usee). 
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Memory 

Memory 

Micro- 




Reads 

Writes 

Cycles Time (usee) 

Notes 

MOV(B) and MTPI(D) 






Mode 






0 R 

0 


0 

0.000 


i <$R or (R) 

0 


0 

0.000 


2 (R> + 

0 


1 

0.170 


3 *»<R>+ 

1 


3 

0.510 


4 -<R) 

0 


1 

0.170 

(i) 

5 o-(R) 

1 


4 

0.680 


6 X(R) 

1 


3 

0.510 

(2) 

7 fsX(R) 

2 


6 

1.020 

(2) 

NOTES: 






MOVB, SMO, and DM1- 

-7 adds 2 ucycles (+0.340 usee) for even byte, 3 

ucycle9 (+0.510 usee) for 

(1) MOV and SMO adds I ucycle (-*>0.170 usee). 




(2) MOV(B) and SMO subtracts 1 ucycie 

(-0.170 usee). 




Jump (JMP/JSR) 






Mode 






0 R 

ILLEGAL 





1 <o>R or (R) 

0 


1 

0.170 


2 <R>* 

0 


2 

0.340 


3 <*(R)+ 

i 


2 

0.340 


4 -<R) 

0 


1 

0.170 


5 @-(R) 

1 


3 

0.510 


6 X(R) 

1 


2 

0.340 


7 «sX(R) 

2 


5 

0.850 



Execute Times 
Instruction 


Doubie Operand 

ADO, MOV 


1 

2 

1.170 

(1,6) 

SUB 


1 

3 

1.340 

(1,7) 

BIC(B), BIS(B) 


1 

2 

1.170 

(1,6,12) 

CMP(B) 


0 

1 

0.170 

(1,11) 

BIT(B) 


0 

1 

0.170 

(1) 

MOVB 


1 

2 

1.170 

(4) 

XOR 


1 

3 

1.340 

(7) 

Single Operand 

CLR(B), COM(B) 


i 

3 

1.340 

(2,7) 

INC(B), DEC(B), ADC(B), 


1 

3 

1.340 

(2,7,8) 

ROL(B), ASL(B) 

NEG(B) 


1 

4 

1.510 

(7,8) 

SBC(B) 


1 

4 

1.510 

(6,8) 

ROR(B) 


1 

4 

1.510 

(6) 

ASR(B) 


1 

5 

1.680 

(7,9) 

TST(B) 


0 

2 

0.340 

(2,5) 

SWAB 


i 

5 

1.680 

(7) 

SXT 


1 

6 

1.850 

(7) 

MFPI(D) 


1 

21 

4.400 

(13) 

MTPI(D) 

1 

1 

22 

4.570 

(6) 
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Memory 

Memory 

Micro- 




Reads 

Writes 

Cycles 

Time (usee) 

Notes 

Branch 

All Branches (true) 



4 

0.680 

<3) 

All Branches (false) 



2 

0.340 


SOB (true) 



10 

1.700 

(3) 

SOB (false) 



7 

1.190 


Jump 






JMP 



1 

0.170 


JSR 


1 

6 

1.850 

(3,10) 


Control, Trap, and Miscellaneous 


Set/Clear Condition Codes 



8 

1.190 


MARK 

1 


9 

1.530 


RTS 

1 


4 

0.680 


RTI 

2 


10 

1.700 


RTT 

2 


19 

3.230 


IOT, EMT, TRAP, BPT 

2 

2 

22 

5.400 

(14) 


NOTES: 

(1) If SMO, 0M0, source / register 7, and destination j* register 7, then fetch overlap is attempted If no 

sorvice request is pending at conclusion of instruction, then next fetch is overlapped (-2 ucycles/-0.340 
usee from next fetch); otherwise, add 2 ucycles (*-0.340 usee) to service phase following instruction for 
PC rollback, add 1 memory read (-*0.000 usee) to next fetch for instruction refetch. 

(2) If DM0 and destination jl register 7, then fetch overlap is attempted. If no service request is pending at 

conclusion of instruction, then next fetch is overlapped (-2 ucycles/-0.340 usee from next fetch); 
otherwise, add 2 ucycles (+0.340 usee) to service phase following instruction for PC rollback, add 1 
memory read (+0.000 usee) to next fetch for instruction refetch. 

(3) If no service request is pending, then next fetch is overlapped (-2 ucycles/-0.340 usee from next fetch); 

otherwise, subtract 1 ucycle (-0.170 usee) from execute. 

(4) For DM0: SMO subtracts memory write (-0.830 usee). SMI-7 subtracts 1 ucycle and memory writ© 
(-1.000 usee). 

(5) DM0 subtracts 3 ucycle (-0.170 usee). 

(6) DM0 subtracts 1 ucycle and memory write (-1.000 usee). 

(7) DM0 subtracts 2 ucycle and memory write (-1.170 usee). 

(8) DM 1-7 and byte addressing adds 1 ucycle (+0.170 usee). 

(9) DM1-7 and byte addressing adds 3 ucycles (+0.510 usee). 

(10) DM3, 5-7 adds 1 ucycle (+0.170 usee). 

(11) SMI -7, DM0, and word addressing adds 1 ucycle (+0.170 usee). 

(12) SMO, DM1-7, and byte addressing adds 1 ucycle (+0.170 usee). 

(13) SMO adds 1 ucycle (+0.170 usee). 

(14) If new PC odd: Microcontrol transfers to writable control store if present and instruction timing does not 
apply; otherwise, trap sequence continues normally with 3 extra ucycles (+0.510 usee). 
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NOTE: 

Accessing the following internal addresses invokes microcode which adds additional microcycles in all phases: 
772300-16 Kernel Page Descriptor Registers 

772340-56 Kernel Page Address Registers 

777540 Writable Control Store Status Register 
777542 Writable Control Store Address Register 
777544 Writable Control Store Data Register 

777570 Console Switch and Display Register 

777572 Memory Management Status Register 0 

777574 Memory Management Status Register 1 

777576 Memory Management Status Register 2 

7776 00-16 User Page Descriptor Registers 

77 7640-56 User Pago Addross Registers 
777744 Memory System Error Register 

777746 Cache Control Register 

777752 Cache Hit/Miss Register 

777766 CPU Error Register 

777770 Microprogram Break Register 

777774 Stack Limit Register 

777776 Processor Status Word 


Times Assumed for All Calculations 


1) Microcycle time is 0.170 usee. 

2) Microcycle time is extended by 0.000 usee during DATI/DATIP with cache hit (all tabulated times assume 

cache hit on read), 

3) Microcycle time is extended by 1.075 usee during DATI/DATIP with cache miss. 

4) Microcycle time is extended by 0.830 usee during DATO/DATOB. 

5) Memory management unit adds no delay when enabled 
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